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HEREDITY AND ENVIRONMENT 


EDWARD L. THORNDIKE 


Institute of Educational Research, Teachers College, Columbia University 


Certain facts published in the Educational Records Bulletin, 
Number 20, of June, 1937, can be used to measure the influence of the 
environment in cases which are important both for theory and prac- 
tice. They concern the variations in the Codperative Test Service 
Examination of 1937 of students in the same school grade who have 
studied a certain subject for a certain length of time. 

Thus in the Latin examination in Grade IX, three hundred thirty- 
one pupils who had studied Latin seven months had an average devia- 
tion from their mean of 5.0; five hundred fifty-seven pupils who had 
studied Latin sixteen months had an average deviation of 5.4; one 
hundred seventy-five pupils who had studied Latin twenty-five months 
had an average deviation of 7.4. When all three groups are combined 
with equal weight the average deviation is increased only to 8.0. That 
is, the variation in a group with an average deviation of six months 
(two-thirds of a school year) in length of study of Latin is reduced by 
only twenty-six per cent when this variation is reduced to zero, or 
near zero. 

The pupils in Grade X who had studied Latin sixteen, twenty-five, 
and thirty-four months, respectively, showed average deviations of 7.2, 
6.5144, and 7.344. The variation when the three groups were combined 
with equal weight was 8.5. The reduction of the variation in length of 
study of Latin from six months to zero or near zero causes in this case 
a reduction of seventeen per cent in the variation in test score. 

Using in the same way the facts for pupils in Grade XI who had 
studied Latin twenty-five, thirty-four and forty-three months, respec- 
tively, I find a reduction of only seven per cent. The Average Devia- 
tion for the scores of the combined group with an AD of six months in 
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length of study is 7.67, and the average AD for a group with zero or 
nearly zero variation is 7.15. 

If Q (the semi-interquartile range) is used as the measure of varia- 
tion in test score, the twenty-six, seventeen and seven are replaced by 
twenty-one, twenty-three, and seven. 

Students in Grade [X who had studied French one, two, and three 
years respectively showed AD’s of 8.14, 7.34 and 9.22, averaging 8.23. 
When all three groups were combined with equal weight, the AD was 
9.07. The reduction was thus only nine per cent. Using Q’s it 
was eight per cent. The three populations numbered four hundred 
fifty-nine, two hundred forty-six, and three hundred seventy-nine, 
respectively. 

Students in Grade XII who had studied French one, two, and three 
years, respectively, showed AD’s of 6.8, 5.2, and 6.0, averaging 6.0. 
When all three groups were combined with equal weight, the AD was 
6.71. The reduction was thus only ten per cent. Using Q’s it was 
fifteen per cent. | 

Let us now observe the effect of the reduction to zero or near zero 
from a still greater variation in length of study. 

By the courtesy of Dr. Ben D. Wood I have the records in the Latin 
and French examinations from college sophomores who had had one, 
two, three, and four (or more) years of study of the language in ques- 
tion. The populations are large enough for our purposes, being fifty- 
two, fifty, one hundred forty-one and one hundred twenty-nine, 
respectively, for Latin,.and three hundred nineteen, five hundred 
seventy, two hundred twenty-nine, and thirty-eight for French. 

In the case of Latin, multiplying each frequency by 1.92, 2.0; .71, 
and .775, respectively, to give equal representation to each of the four 
amounts of training and computing the average deviation of the com- 
bined group gives 13.4 points. For the separate lengths of study of 
Latin, the average deviations are: One year, 11.0; two years, 10.9; three 
years, 10.6; four years or over, 8.7. The average is thus 10.3.1 The 
variation in score is thus reduced by only twenty-three per cent when 
the variation in length of study is reduced from an average deviation 
of a year to zero or near zero. 

The same procedure applied to the eleven hundred fifty-six college 
sophomores who had studied French from one to four years gives a 
reduction of the variation (measured by the average deviation) of 





1If the semi-interquartile range is used, the reduction is from 11.7 for the 
combined group to 8.6, or by twenty-six per cent. 
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thirteen per cent (from 11.2 to 9.75). Using the semi-interquartile 
range, the reduction is twenty per cent. 

The same procedure applied to four hundred twenty-five college 
sophomores who had studied German from one to three years gives a 
reduction of twelve per cent (ten by AD, fourteen by Q). Applied to 
three hundred seventy-three who had studied Spanish from one to 
three years, it gives almost no reduction, (two per cent and one-half 
per cent). 

If a certain amount of variation in maturity and in the action of the 
selective forces which retain a pupil in school and advance him in grade 
is added to the variation in length of study, we can get high-school 
groups also in which the latter is a whole school year. 

The facts for Latin, calling a school year nine months, are as follows: 


Grade IX 7monthsstudy Q=4.3 AD=5.0 n= 331 
Grade IX 16 months study 5.05 5.4 557 


Grade X 16 months study 5.55 7.2 504 
Grade X 25 months study 5.45 6.55 642 
Grade XI 25 months study 6.85 8.0 266 
Grade XI 34 months study 6.1 7.4 291 
Grade XII 34 months study 6.0 8.5 101 
Grade XII 43 months study 6.2 7.2 162 
Average 5.7 6.9 


All combined with equal weight, Q = 12.65, AD = 13.2. 

Reduction in the variation in length of study from one year to zero 
or near zero produces a reduction in the variation in score of fifty-two 
per cent (fifty-five by Q, forty-eight by AD). Much of this reduction 
is, however, due to the factors of maturity and selection which are corre- 
lated with the length of study. This is shown by the following facts: 
When groups of two lengths of study in the same grade are combined 
(with equal weight) the variation is hardly increased at all over that of 
either group. So combining the seven-month and sixteen-month 
groups in Grade IX gives a Q of 4.3 and AD of 4.6; combining the 
sixteen-month group and the twenty-five-month group in Grade X 
gives a Q of 5.7 and AD of 7.1; combining the twenty-five-month and 
thirty-four-month groups in Grade XI gives a Q of 5.8 and AD of 6.6; 
and combining the thirty-four-month and forty-three-month groups 
in Grade XII gives a Q of 6.3 and AD of 8.0. 

But when groups of the same length of study in two grades are 
combined (with equal weight), the variation is increased substantially. 
So combining the sixteen-month groups in Grade IX and Grade X 
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gives a Q of 6.1 and AD of 6.9; combining the twenty-five-month 
groups in Grade X and Grade XI gives a Q of 7.1 and AD of 8.4; 
combining the thirty-four-month groups in Grade XI and Grade XII 
gives a Q of 6.4 and AD of 7.8. Of the reduction of fifty-two per cent 
at least half may safely be credited to the reduction of maturity and 
selection differences when the population of only one grade is used. 
The twenty-six per cent remaining is about the same as the twenty-four 
and one-half per cent found for the college sophomores. 
The facts for French are as follows: 


Grades X and XI 7 months study Q=7.75 AD= 8.7 
Grade X 16 months study Q=5.75 AD= 7.2 
Grade XI 25 months study Q=5.4 AD= 6.1 
Grade XI 34 months study (or more) Q = 5.85 AD= 6.8 
Average Q=6.2 AD#= 7.2 
All combined with equal weight Q=8.7 AD =11.0 


Reduction in the variation in length of study from one year to zero 
or near zero produces a reduction in the variation in score a bit under 
thirty-two per cent, (twenty-eight and one-half by Q, thirty-four and 
one-half by AD). If a fourth of this is credited to the maturity and 
selection factors, we have twenty-four per cent to put with the sixteen 
and one-half per cent found for college sophomores. 

From these various and mostly independent determinations we 
have the following reductions in the variation in score with reductions 
in the variation in length of study: 


6 months (two-thirds of a school year) to zero or near zero 
Latin by AD, 26, 17 and 7; by Q, 21, 23 and 7; average 17. 
French by AD, 9 and 10; by Q, 8 and 15; average 104. 

9 months (one school year) to zero or near zero 
Latin by AD, 23, 24; by Q, 26, 2744; average, 25. 

French by AD, 13, 26; by Q, 20, 21; average, 20. 


Consider now the influence of variation in amount of school training 
plus the variation in maturity (and possibly in selective forces) in the 
case of the combination of abilities in English usage, spelling, and 
word knowledge measured by the Codéperative Test Service English 
Examination. 

The variation in the various groups is as shown in the table on p. 
165. 

Reduction of the variation in education from an average deviation 
of one and a quarter school years to zero or near zero, produces a 
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reduction in the variation in English score of only thirteen or fourteen 
per cent (twelve by AD, fifteen by Q). After any reasonable allowance 
for the part of the variation caused by maturity, there is very little left 
to be credited to the differences in amount of school training, not a 
tenth of what must be allotted to differences in heredity and in other 
environmental forces than school training. 


640 students in Grade VIII AD =6.75 Q=5.8 
2083 students in Grade IX AD=7.5 Q=6.6 
2402 students in Grade X AD = 7.53 Q = 6.35 
2767 students in Grade XI AD=8.2 Q=6.1 
2701 students in Grade XII AD=7.4 Q=6.35 
Average AD =7.47 Q=6.2 


All five groups combined with approximately equal 
weight, AD = 8.5, Q = 7.32. 


These facts are important for theory because they show the weak- 
ness of the environment as a cause of differences in abilities where it 
should be strong. Latin, French, English usage, spelling, and vocabu- 
lary are in large measure informational abilities, and presumably more 
susceptible to increase by training than such powers as strength, 
energy, memory, or intelligence. Heredity alone without some special 
training would provide no boy or girl in these schools with any appreci- 
able knowledge of Latin or French. By giving one of a pair of identical 
twins a few hundred hours of teaching of Latin and French which the 
other lacked, a school could create a great difference in them in respect 
of these abilities. Yet variations in these abilities are out of all 
proportion to variations in the amount of school training. 

Let anyone make any reasonable allowances for the fact that the 
time spent in study is spent in part upon things not measured by the 
tests, for the fact that the students in the later years of study may 
spend less time on their Latin or French than they did in the earlier 
years, and for any other facts which make the average difference in 
actual training with the Latin or French in the combined groups less 
than the average difference in years, and make the average difference in 
actual training within a single year group greater than zero. There will 
still remain a notable lack of correlation between amount of training 
and amount of test-ability, a notable failure of the variation in training 
to account for more than a minor share of the variation in the ability 
measured by the test.! 





1It must be kept in mind also that these students, especially the college 
sophomores, are a superior selection in respect of ability to profit by training, and 
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We are subject to a fallacy in arguments about the causation of 
individual differences in man, which leads us to suppose that what 
training of a special sort does under special conditions training of any 
sort does in general. At its worst this fallacy is as bad as arguing that 
differences in stature are due to the environment, because one of two 
identical twins became fourteen inches shorter than the other by having 
his legs cut off! So the resemblances (that is, reduced variation) of 
whatever sort among siblings were plausibly attributed to resemblances 
in their home environments. The facts for children reared in orphan- 
ages showed how weak this argument really was. So the environment 
of infancy when the mind was supposedly at the mercy of any Watson 
who might inflict a loud noise upon it when it was viewing a dog or 
snake, or of any mother who cuddled it too long or often, turned out in 
Bregman’s extensive and careful experiments to be a very feeble 
modifier. It has been plausible to attribute the bulk of various 
resemblances and differences to schooling because we did not look to 
see just what the actual variations in schooling do do. The more one 
looks the more one is confronted by failures of the environment to do/ 
what has been expected of it. 

The facts are important for practice because they may improve our 
notions of what is to be expected concerning the relative contribution 
of the genes and the environment in such matters as acquiring political 
power, learning a skilled trade, succeeding in a profession, or making 
money. These achievements are, in a sense, halfway between things 
like intelligence and sanity and things like learning Latin and French. 
They may prevent us from the folly of assuming that there is a political 
ability or a legal ability or a money-making ability comparable to the 
intellectual ability measured by our intelligence tests, fairly constant 
through the changes of the years, with a variation among men four- 
fifths of which is caused by variations in the genes; and also from the 
folly of assuming that political experience alone can create political 
sagacity, that apprenticeship can put skill in a man as easily as custom 
puts clothes on him, or that most business failures are caused by lack 
of capital. 





that all or nearly all of the college sophomores studying Latin were studying it by 
choice, not by requirement. Consequently the variation among those with the 
same length of time of study is reduced, probably very much reduced, below the 
variation in a random selection of persons of the same age exposed to training in a 
language for one, two, three, or four years. 
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MOTOR EFFICIENCY OF THE EYE AS A FACTOR IN 
READING* 


MILES A. TINKER 
University of Minnesota 


The réle played by oculo-motor behavior in the reading process 
has received considerable emphasis ever since the pioneer work, over 


sixty years ago, on eye movements in reading. Much has been made , 


of the need for developing desirable eye-movement habits as a basis 
for satisfactory progress in reading proficiency and to relieve reading 
disability. In fact, several of the published programs for remedial 
reading include directions for eye-movement training. Such programs 
are based upon two fundamental assumptions: (1) Oculo-motor control 
is an important factor in reading proficiency, and (2) “‘ proper” patterns 
of saccadic or reading movements developed through training will 
promote improvement in reading status. One investigator (1) who 
represents this group of writers has stated clearly the alleged impor- 


tance of eye-movement pacing as a method of improving reading . 


performance. He said that “such training improved eye-movement 
habits, which in turn increased reading efficiency.”’ Presumably, 
oculo-motor control is considered to be an important determinant of 
reading proficiency. Peripheral factors, therefore, would seem to 
condition the reading process to a large degree. 

Opposed to this point of view are those who maintain that reading 


is chiefly a central process. Reading status is considered by them \ 
to be dependent upon proficiency of the higher mental processes of | 


apprehension and assimilation. Eye-movement patterns are con- 
sidered to be not explanations of reading ability but merely symptoms 
or expressions of efficiency in comprehending and assimilating during 
the reading. It is held that since the eye movements follow and reflect 
the “‘mind” they cannot control what it does. 

If it could be established that oculo-motor control is intimately 
related to level of reading proficiency the results would support. the 
view that peripheral factors are important. The purpose of this study 
is: (1) To measure accuracy of oculo-motor control as manifested by 
accuracy of voluntary fixation during successive perceptual acts, (2) 
to measure speed of convergence and divergence during successive 





1 The expenses of this study were met by a research grant from the Graduate 
School, University of Minnesota. 
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visual fixations, and (3) to ascertain the relation of the above measures 
to speed of reading and eye-movement measures of reading. 

| A total of sixty-four university sophomores were employed as 
“subjects in the experiment. The following measurements were made: 

(1) The Minnesota Speed of Reading Test was given with standard 

time limits. The score was number of paragraphs read correctly. 
. (2) An eleven-line paragraph of easy prose was read while eye 
movements of the readers were photographed. The material was 
printed in Scotch Roman, twenty-five-pica line width, ten-point 
type with two-point leading on eggshell paper stock. Records were 
tabulated for the second through to the tenth line. The first line was 
at omitted because it takes about one line to adjust to the reading. The 
igi last line was not full length. The following eye-movement measures 
resulted: Mean number of fixations per line, mean pause duration, 
mean number of regressions per line, and mean perception time (pause 
duration multiplied by frequency) per line. The time unit used was 
one-fiftieth second. 

(3) Successive fixations were photographed while reading numbers 

ir located at the ends of ten blank horizontal lines. They were twenty- 
a five picas (414 inches) long including the digits and were separated by 
one-fourth inch of space. Thisis situation A. The order of the read- 
. ing was left end to right to left to right, etc. The first ten fixations 
that yielded legible records were.used. This produced a situation in 
which half of the moves were somewhat similar to reading the first 
word at the beginning of a line after the return sweep from the end of 
the preceding line. The number of readjustments at each fixation 
was tabulated and also the amount of each readjustment was meas- 
ured in hundredths of an inch (on the film), and tabulated. 
4 (4) In the same way (situation B) the accuracy of fixation was 
3 obtained while reading numbers at the ends of blank lines thirty-six 
picas (six inches) long. This is the width of a long line of print while 
the twenty-five picas represented a medium width of line. 

(5) Similarly accuracy of fixation was measured by fixating 
alternately two dots six inches apart horizontally. This is situa- 
tion C. 

(6) Speed of convergence and divergence was recorded photo- 
graphically while the fixation was shifted alternately from a focal point 
fifteen inches away to eight inchesaway. Thisissituation D. Legible 
records on only thirty-six subjects were obtained in this series of 
measurement. There were eight to ten measurements on each subject. 
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Although results are given for this small group, no emphasis is placed 
upon the findings. 

In the photographic work the customary precautions were taken to 
help the subject adapt to the experimental situation. In this the 
operation of the camera was explained and practice series run. 


RESULTS AND DISCUSSION 


The group means and standard deviations of the measures are 
givenin Table I. The first four eye-movement measures are for read- 


TaBLE I.—MgaANs AND STANDARD DEVIATIONS OF THE MEgasuRES EMPLOYED 




















Measure used Vals os N_ | Mean} caist. 
measurement 

Minn. Speed of Read................. Paragraph 64 | 28.4| 5.8 
err a re eet Number per line | 62 7.3] 1.5 
E. M. Pause Duration. ............... léo9 second 62. | 11.2] 1.8 
a ae Pn. ; 5s eckcacrewacae neve lé9 second 62 | 80.7 | 21.2 
My es SO... eo Ficdavecanseuat Number per line | 62 1.0; 0.7 
E. M. Adjustments A................. Total number 63 9.8; 3.1 
E. M. Adjustments B......... Jiutis oak Total number 63 | 10.4} 4.5 
E. M. Adjustments C..... ge Es op Ie Total number 63 9.2] 2.4 
Te eR Ms ee i de aan so 6 Yoo inch 63 1.9] 0.9 
is es Ay SL As ss oo oc we awe s eadce Ki o0 inch 63 2.8} 2.1 
Be, Se MA GA Ao os 2s oo ed bans di oo inch 63 1.6| 0.7 
E. M. Convergence D................. lk second 36 6.9 onde 





ing prose; the following six for accuracy of fixation; and the final 
one for speed of convergence-divergence. Note the relatively small 
variability for pause duration and the relatively large variability for 
regression frequency and certain of the accuracy of fixation measures. 

The odd-even reliability coefficients (not raised) for eye movements 
in reading were: .70 for fixation frequency, .67 for regression frequency, 
.77 for pause duration, and .85 for perception time. 

The intercorrelations of the various eye-movement measures and 
the speed of reading scores are givenin Table II. There is the expected 
small negative correlation (—.27 to —.53) between speed of reading 
and the eye-movement measures on prose reading. The correlations 
of .75, .71, and .58 between number of readjustments and amount of 
readjustment indicate that those individuals who are more frequently 
inaccurate in their fixations on numbers and dots also tend to be inaccu- 
rate by a greater amount. Where digits are read during the fixation, 
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the two measures of accuracy of fixation are more intimately correlated 
with each other (Situations A and B). It is noteworthy that not 
more than one fixation out of ten was made without at least one 
readjustment to correct inaccuracy of the fixation. Frequently 
two and sometimes three readjustments were necessary to arrive at 
the proper fixation for clear perception of the digit or dot. The num- 


Taste II.—INTERCORRELATION OF VISUAL FIXATION AND READING ScorRzEs 

















Measures used 
A: B: C: 
: : A: B: C: 
auareees Read- Num-| Num-| Num- |, ount|Amount|Amount 
used ; ber of | ber of | ber of D: 
m8 | read- | read- | read- of read- of read- of read- Time 
r 
speed | . P : just- | just- | just- 
Om | Fee | just ment | ment | ment 
ments | ments | ments 

Fixation frequency |—.531; .240) .059) .268 .070| — .057 . 252) — .194 
Pause duration... .|—.274| .156) .029)—.013 . 250 .217| —.092|) —.156 
Perception time. ..|—.575) .297| .051) .207| —.015) .041 . 150) — .244 
Regression fre- 

quency......... —.351| .354) .036) .328 . 136); — .092 . 249) — .104 
A: Number of re- “f 

adjustments.....}—.121}...... .434) .332 : Peele — .002 
B: Number of re- 

adjustments.....;—.050) .434/...... a aes NP — .154 
C: Number of re- 

adjustments.....|—.2386) .332) .114)......].......]....6-. . 580) — .198 
A: Amount of re- 

adjustment..... i: SON wn4e Chr ss o0rkenwenes .618 . 240) — .072 
B: Amount of re- 

adjustment..... — .019)...... | SO : ES .014| — .089 
C: Amount of re- 

adjustment..... tt Ms dine divceess .580} .240| .O014)....... — .200 
Be TN io veo . 266) — .002) — . 154) — .198| —.072 ~.089] = 209 























ber or amount of readjustment in one situation showed only a small 
correlation (.24 to .62) with either measure of accuracy of fixation in 
the other two situations. The intercorrelations tend to be slightly 
higher, however, where reading of numbers was involved in the two 
situations compared (.43 and .62 vs. .33 and .24). Accuracy of fixation 
shows no correlation with speed of convergence-divergence. 

Our main interest is concerned with the relation of oculo-motor 
efficiency in fixation and the measures of reading performance. The 





r 


oO 
I 


se + @©2® 6 © 0 


an tt oo ts 1 a SF Cr Get het Gh fee 








Motor Efficiency of the Eye 171 


correlations between speed of reading and accuracy of visual fixations 
range from —.02 to —.34 with an average coefficient of —.15. Speed 
of reading correlates .27 with speed of convergence-divergence. Only 
one of these coefficients (—.34) is sufficiently stable to be significant. 
It is safe to conclude that very little or no correlation exists between 
these variables. 

When accuracy of fixation is correlated with each of the eye-move- 
ment measures, shown in the first four columns of the table, the coeffi- 
cients vary from —.19 to +.35 with an average of +.09. There is 
clear indication here that oculo-motor efficiency, either in fixating 
objects at the end of a line-length sweep of the eyes, or in speed of 
convergence-divergence, bears no significant relation to proficiency in 
reading. Again only one of the coefficients satisfies the criterion of 
adequate statistical stability. 

Vernon® has suggested that oculo-motor efficiency as manifested 
in accuracy of voluntary fixation is intimately related to frequency 
and duration of the pauses in reading and to regression frequency. 
Her subjects seemed to fall into two classes: (1) Those who read a line of 
print with many fixations of brief duration and with some regressions; 
and (2) those who read a line with few fixations of long duration. In 
summarizing she states that ‘‘those whose ocular-motor ability, as 
constituted by steadiness of voluntary fixation and accuracy of 
voluntary movement and of return movement in reading was great, 
made a small and regular number of fixation pauses of long, irregular 
duration, while those whose ocular-motor ability was relatively small 
made a large and irregular number of fixation pauses of short and 
regular duration.” 

Our results in Table II show no consistent trend of correlation 
between any measure of oculo-motor efficiency and fixation frequency, 
pause duration, or regression frequency. The largest coefficient 
(+.35 + .08) is between number of readjustments (A) and number of 
regressions. All others are insignificant. There is, therefore, no 
support in our data for Vernon’s conclusions. 

Vernon employed only nine subjects and made an extensive 
examination of individual records. A fairer test of her conclusions 
would be, perhaps, to examine the oculo-motor efficiency of the 
extremes of our group in relation to the reading measures. Therefore, 
the fifteen most efficient and the fifteen least efficient in visual fixation 
were chosen for the comparisons shown in Table III. Each score 
represents an average for fifteen subjects. Note that low perception 
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time score shows faster reading. Examination of the two parts of this 
table reveals the following trends: (1) Greater oculo-motor efficiency 
in fixation is accompanied by faster reading in the reading test and in 
the selection photographed (perception time). (2) In general, slightly 
fewer fixations and regressions are made by the group with greater 
oculo-motor efficiency, and the fixation pauses tend to be slightly 
shorter for the same groups. (3) Although the trends seem to be 
fairly consistent from one reading measure to another, no difference is 
striking. (4) When the extreme cases of efficiency in oculo-motor 
adjustment in voluntary fixation are considered, therefore, there is a 
general trend within these small groups for those with better motor 
control of the eyes to read faster, and to make shorter and fewer 
fixations and regressions. 

In another report (‘, p. 96), Vernon, in discussing the relation of 
oculo-motor efficiency in relation to photographic measures of reading 
makes the following statement. ‘‘ But whatever the origin, it seems 
clear that the use of frequent short pauses, or less frequent long pauses, 
and the tendency to overrun the word and then regress to it, are perma- 
nent ocular-motor habits, unconnected with perception and assimila- 
tion of the reading content.’’ Our findings are definitely contrary 
to this conclusion. With the groups showing extremely accurate and 
extremely inaccurate oculo-motor control, the relationship between 
motor efficiency of the eyes and speed of reading (indicating perception 
and assimilation) is just as great and just as consistent as the relation- 
ship between motor ability of the eyes and pause duration or fixation 
and regression frequency. It is clearly evident +h3t whatever slight 
relationship is present is definitely associated witu reading proficiency. 

When extremes of speed of convergence-divergence were considered 
in a similar manner, equivocal results were found. 

What bearing do the results of this study have upon the question 
of peripheral versus central determinants of reading proficiency? 
When a representative sample, 7.e., our group taken as a whole, is con- 
sidered, the absence of any significant correlation between oculo-motor 
efficiency in voluntary visual fixation and either measures of reading 
speed or photographic measures of reading ability furnishes no support 
for the contention that variation in motor control of the eyes in volun- 
tary fixation is an important determinant of reading ability. Our 
comparison of the extreme groups does suggest, however, that when- 
ever the motor control of the eyes in visual fixation is decidedly 
inefficient, certain readers may be handicapped in reading. Such 








cases would be rare, however. 
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Examination of the individual scores 


reveals much overlapping in scores between the members of the two 


TaBLe III.—Comparison oF ExTREME Groups IN Accuricy or VISUAL FIXATION 
with Reapinc SPEED AND EYE-MOVEMENT MEASURES 

(N = 15 in Each Group) 

Motor Efficiency = Number of Readjustments 












































Mean score 
Se- Number 
lec- Group . | Number; Pause | Percep- | Number 
; of re- | Reading ‘ 
tion ; fixa- dura- tion regres- 
adjust- test : . 
tions tion time sions 
ments 
BE ic iica een ve 13.5 25.7 7.9 11.4 90.2 1.3 
a ti al 5.7 28.3 7.5 10.9 81.6 0.9 
BR eg 16.3 27.0 7.4 11.7 86.3 1.0 
| eat 6.1 29.3 6.9 11.0 75.5 0.9 
Ge MR inciwen Sea 11.8 23.8 8.3 11.3 93.7 1.5 
RS Serer 5.9 27.9 6.8. 11.1 75.2 0.9 
Motor Efficiency = Amount of Readjustment 
Se- Amount Readin Number| Pause | Percep- | Number 
lec- Group of read- S| fixa- dura- tion regres- 
, test : . ; . 
tion justment tions tion time sions 
Se Rb cehnd cals 3.2 27.1 7.2 11.8 84.9 1.0 
SR bu cus 2 oscil 0.9 28.6 6.7 10.7 71.2 0.8 
oS) ness 6.1 27.4 7.3 11.5 84.1 1.0 
IR SETI 1.0 28.3 7.3 11.3 82.6 1.0 
RN GRE AS 2.4 24.3 7.8 11.7 | 90.5 1.3 
MES es.6 gases 0.9 29.1 6.6 11. 73.4 0.8 


























* A: Reading single numbers at alternate ends of blank lines twenty-five picas 


(444 inches) long. 


B: Same for lines thirty-six picas (six inches) long. 
C: Alternate fixations of dots thirty-six picas apart. 


extreme groups. 


For example, thirty-three per cent of the persons 


with poor oculo-motor adjustment exceed the speed of reading median 
of those with efficient motor control of the eyes; and slightly less than 
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half of the inefficient group have fewer fixation pauses than the median 
of the group with good oculo-motor adjustment. 

It is conceivable, in the light of these comparisons, that training 
to improve the accuracy of motor control of the eyes in sweeping from 
one fixation to another would aid an occasional person to improve his 
proficiency in reading. 


SUMMARY AND CONCLUSIONS 


(1) Accuracy of oculo-motor control in visual fixation as related 
to photographic measures of reading and speed of reading was measured 
for a group of sixty-four university sophomores. 

(2) No significant correlation (r) was found between accuracy of 
visual fixation and measures of reading proficiency. 

(3) The conclusions of Vernon concerning the relation of oculo- 
motor proficiency and photographic measures of reading performance 
are to a large degree refuted by the results reported here. 

(4) When extremes alone of the group are compared there appears 
to be a slight and consistent relation between motor efficiency of the 
eyes and reading ability. It is suggested that an occasional person 
may be benefited in reading by training the eyes to greater accuracy in 
sweeping from one fixation to another, such as the back sweep from the 
end of one line to the beginning of the next. 
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A COMPARISON OF TWO NEW-TYPE QUESTIONS: 
RECALL AND RECOGNITION 


DOROTHY M. ANDREW 
Pennsylvania College for Women 


AND 


CHARLES BIRD 
University of Minnesota 


If we desire data concerning the reliability and relative difficulty 
of recall and recognition examinations which have been given to 
determine college grades, we find none. If we inquire into the effect 
upon course grades produced by correcting recognition questions 
in examinations containing also recall items, we sometimes are told, 
“‘Grades cannot be changed because the rank order of the students 
will remain the same; it is a waste of time, therefore, to allow for 
guessing.”” Should we ask whether excellent or poor students do 
relatively better in recall or in recognition tests, honesty dictates the 
admission of ignorance. Some instructors might silence, but not 
satisfy, students with the answer that both kinds of items are meas- 
uring the same thing and to the same degree, since coefficients of 
correlation between two sets of scores, when corrected for attenuation, 
range from .90 to over 1.00. Parenthetically, may it be noted, the 
lack of satisfaction is not explained entirely by ignorance of statistical 
argument. College students, who have struggled with and objected 
to recall items while they have accepted willingly an equal number 
of recognition items, cannot be convinced by a coefficient that the 
two kinds of quest? s are measuring the same things. In spite of 
the existence of mure than a baker’s dozen of researches into the 
reliability and difficulty of recognition and recall questions, there is 
not one study which has been pointed directly to these problems as 
they affect course grades on the college level. Yet it is on this level 
that we need to know most keenly what is being accomplished by the 
use of new-type questions. Instead of having to deduce results from 
general information tests, we should have facts gathered directly 
from college examinations. 

The majority of existing researches parallel closely the pioneer 
study of Toops.”° Tests of information have been set up as recall, 
five-response, three-response, or two-response recognition, and as 


true-false items. In each form identity of phraseology and content 
175 
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has been the objective. Either the order in which forms have been 
presented was varied to probe matters of reliability and difficulty” 
or the same ends have been sought by group comparisons. In the 
latter procedure, recall tests are given to all subjects on the first day 
and afterward a different form of question is given to each of four 
equated groups (6, 7, 8, and.9). A deviation from this procedure, 
one bordering slightly more upon teaching problems, is described 
by Remmers et al. They compared true-false, single-choice, and 
incomplete sentence questions with each other. These questions had 
been contributed by the fifty-six students in a course in General 
Psychology, and were later presented to them as an experimental 
examination. The tests cannot be said to parallel classroom pro- 
cedures, since the students were told an experiment was in progress 
and that in no way would their grade in the course be affected by the 
tests. Not one of these researches, then, utilizes new-type tests as 
achievement indices in college courses. Although results obtained 
under diverse conditions may bear resemblances, a much coveted 
outcome in a field where contradictions balk confidence, our knowledge 
of what new-type examinations are doing is best advanced by study- 
ing these examinations as an intrinsic part of college courses. 

In this investigation, students enrolled in three advanced courses 
were given midterm and final examinations in a routine manner. 
The students were accustomed to tests composed of both single- 
choice recognition questions, in which one answer in four is correct, 
and recall questions requiring the completion of a sentence by a 
single, appropriate word. We, therefore, maintained the regular 
examination procedure but controlled the structure of the tests. 
This was done by using the same number of recognition and recall 
questions in the two halves of the examinations and by reversing the 
form of the questions in an alternate examination form. Therecogni- 
tion questions of Form A became the recall questions of Form B 
and vice versa. Care was taken to secure identity of phraseology in 
the two types of questions. The samples which follow illustrate the 
technique used. 


In paresis, the atrophy of the cerebral cortex is greatest in the (1) parietal; 
(2) frontal; (3) temporal; (4) occipital; lobe. (Recognition.) 

In paresis, the atrophy of the cerebral cortex is greatest in the 
—______ lobe. (Recall.) 





The type of objective question which generally is most reliable and valid 
is the (1) multiple choice; (2) analogy; (3) completion; (4) matching; type. 
(Recognition.) 
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The type of objective question which generally is most reliable and valid 
is the type. (Recall.) 

In all kinds of competitive performance there are at least two social factors 
operating; one we call social facilitation and the other (1) inertia; (2) rivalry; 
(3) auditory reinforcement; (4) projection. (Recognition.) 

In all kinds of competitive performance there are at least two social factors 
operating; one we call social facilitation and the other ° 
(Recall.) 








The courses were taught by the writers and were traditionally 
labelled Abnormal Psychology, Social Psychology, and Educational 
Psychology. Respectively, one-hundred eighty-five, eighty-three, 
and forty-eight students were examined by both midterm and final 
tests. The above questions have been chosen from each of these 
three fields. In Educational Psychology the students receiving 
Form A and those receiving Form B of the examination were matched 
in terms of grades already earned in psychology courses. Since the 
other two courses enrolled a large number of students, the examination 
forms were passed out alternately at the midterm, but the form then 
received determined also the structure of the final test. 

To clarify the procedure followed in Abnormal Psychology and 
Social Psychology courses, a more detailed description will be given. 
Except for difference in content, the form and the number of questions 
used were the same in both courses. The midterm tests consisted 
of eighty questions, forty recognition and forty recall items. Half of 
the group received Form A and the other half received Form B. 
These forms actually covered the same subject-matter, but the recog- 
nition questions of Form A were expressed as recall questions in 
Form B. Care was taken to avoid any possible influence being 
exerted by the order of questions. The recognition items were 
given first in both forms, but these items appeared in the same order 
as the recall items of the other form. This arrangement .of items 
permits, then, a comparison of the relative difficulty of recognition 
and recall items, providing the assumption be granted that by passing 
out the examination forms alternately equal groups of students have 
been secured. 

The final examinations consisted of one hundred sixty items, 
eighty recognition questions comprising the first and eighty recall 
questions the second part. But not all of these items were new to the 
student. He received again, as the even numbered items of his 
examination, the midterm questions, but this time what had been 
met as recall questions now appeared as recognition questions and 
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vice versa. This technique was followed to discover whether the 
same student would find the two types of questions of unequal diffi- 
culty and whether he would profit more from having met one or the 
other kind of item. The eighty new questions, forty of each type, 
served as repetitions of the investigation and for making contrasts. 

In the Educational Psychology class a simpler procedure was 
made possible by equating students for learning ability. The mid- 
term tests consisted also of two forms, each having twenty-seven 
recognition and twenty-seven recall items, and the final tests having 
ninety-five of each type of question. No attempt is made in con- 
nection with this class, to inquire into the pedagogical value of the 
two kinds of items. Concern is centered upon their reliability and 
difficulty. 

It should be emphasized that discussion of the midterm tests 
with the students did not extend beyond the customary meaning of 
the scores. This was a regular class procedure. But equal emphasis 
needs to be put upon the necessity of having to admit, two days after 
this test, that an experiment had been in progress. The students 
were not accustomed to receive a number of recall questions equal 
to the number of recognition questions. They were disturbed by 
feelings of inadequacy. Somehow it didn’t seem quite fair to expose 
their academic nakedness. Their remarks at this next class were such 
as to invite an open forum. But at no time were any hints given 
that the same questions would be used again in modified form. It was 
made clear that the final examination would follow the same pattern 
as the midterm test. Nothing would have been gained by indirection, 
for the more astute students had discovered through post-examination 
discussion that other students had met recognition items as recall 
items. 

Recall questions have had, generally, a higher reliability than 
various forms of recognition questions. An exception is found in the 
study of Ruch and Charles. It should be noted that the term 
reliability denotes here a relationship between scores obtained either 
separably on two forms of a test or by securing the sums of odd and 
of even items in the same test. We are considering the reliability 
of scores rather than of items. A comparison of the latter is pre- 
sented in a previous study. Under actual examination conditions 
recall tests likewise enjoy greater reliability than recognition tests. 
This is made clear from a perusal of Table I. When subject-matter 
is the same, one exception may be noted; namely, the higher coefficient 
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for Social Psychology Form B, recognition, in comparison with that 
for Form A, recall. These coefficients are as high as, and perhaps, 
generally higher than, most which have been reported to date. In 
view of the restricted range of ability in these classes, namely, college 
and university juniors, seniors, and graduates, the magnitude of the 
coefficients indicates that careful preparation of examinations can be 
expected to offset selection of talent. Ruch and Stoddard’ assumed 
that their reliability coefficients were higher than coefficients found 
by Toops” because the latter were adversely affected by a restricted 


TaBLe I.—Coxgrricients oF RELIABILITY (OpD-EVEN) BasED UPON CORRECTED 
AND UNCORRECTED Scores In Fina, EXAMINATIONS 











Uncorrected scores Corrected scores 
Corre- | Spearman-| Corre- | Spearman- 
lation Brown lation Brown 
Recognition 
Abnormal Psychology A....... .62 .76 .59 .74 
Abnormal Psychology B....... .68 81 .59 .75 
Social Psychology A.......... .57 .73 .48 .65 
Social Psychology B.......... .72 .83 .78 .87 
Educational Psychology A..... .58 .73 71 .83 
Educational Psychology B..... .72 .84 .74 .84 
Recall 
Abnormal Psychology A....... .80 .89 .80 .89 
Abnormal Psychology B....... .70 .82 .70 .82 
Social Psychology A.......... .59 .74 .59 .74 
Social Psychology B.......... .89 .94 .89 .94 
Educational Psychology A..... .83 91 .83 91 
Educational Psychology B..... .60 .75 .60 .75 

















range of ability. Another reason may be that the examination 
content has as much as or more to contribute to reliability. 
Controversy has prevailed concerning the desirability of correct- 
ing recognition questions to offset the effects of guessing. One 
member of a varying team of investigators? once opposed correction 
because there appeared evidence warranting the conclusion that 
“the marks are made more untrustworthy to some extent by correction 
practices now in vogue” (*, p. 119). Correction was assumed to 
lower reliability. In analyzing the same data, but published else- 
where, this investigator joins in informing us, ‘‘The increased varia- 
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bility of the corrected scores would be a real gain in case it accompanied 
increased reliability which does not appear to be the case”’ (’, p. 99). 
Most frequently the differences between reliability coefficients derived 
from corrected and uncorrected scores have been small enough to be 
ignored, where the value of correcting for guessing has been an issue. 
Investigations carried on by Ruch, it is true, more often have yielded 
lower reliabilities following correction. Higher coefficients are not 
lacking in other researches. It was thought useful, therefore, to 
present a set of coefficients based upon corrected scores in this study. 
They are included in Table I. Three coefficients are raised and three 
are lowered by correcting for guessing. The determination of cor- 
rection does not depend either solely or of necessity upon changes in 
reliability. Uniformity of direction of change in the magnitude of 
coefficients is not the rule. 

Recall questions have yielded lower mean scores than recognition 
questions having a similar introductory phrase when general informa- 
tion has been plumbed. Do college students carry their learning 
sufficiently above the threshold of recall to equalize the value of 
these two kinds of questions? Presumably a thorough grasp of 
knowledge would narrow the discrepancy generally anticipated. A 
comparison of recognition scores in Form A of one test with recall 
scores in Form B of another test given simultaneously provides an 
unequivocal answer. Of the twelve contrasts afforded by the two 
examinations given in each of the three courses, all show that ques- 
tions demanding relatively unaided recall are more difficult than 
recognition questions which cover identically the same content. 
Table II represents difference ratios calculated on the basis of recogni- 
tion questions corrected and not corrected for guessing. 

Differences obtained by comparing recall items with uncorrected 
recognition items are exceedingly large and always statistically sig- 
nificant. It is presumed that the prevailing practice in computing 
total scores in new-type tests is to ignore the element of guessing or 
to admit that an examination is not intended to measure knowledge 
entirely. The differences contribute to the understanding of the 
behavior disturbances manifested by college students during and 
after these examinations. Students get little pleasure from being 
self-exposed as fairly proficient in the more infantile or recognitive 
type of memory and as strikingly deficient in the more intrinsically 
determined or recall level. An appreciation of marked differences 
psychologically but not statistically supported, probably underlies 
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TaBLe I].—Dirrerence Ratios BETWEEN AND RELATIVE VARIABILITIES FOR 
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the frequently received but unsolicited admissions that “college 
examinations have nearly always called for the recognition of the 
right answer so that we have carried our study no further than the 
level necessary to pick out what we think is correct.” 

The formula used to allow for guessing, namely, corrected score 
equals the number right minus the number wrong divided by the 
number of possible choices minus one, or, upon this occasion, the num- 
ber right minus one-third the number wrong, has been applied to the 
recogniti..n questions. A considerable reduction in the differences and 
difference _>‘.03 follows, but in only two instances (the first two com- 
parisons listed in Table II under the course title of Educational Psy- 
chology) are the discrepancies between recall and recognition items 
less than the usually acceptable threshold for reliable differences. Do 
these differences, then, represent the degree to which recall questions 
exceed in difficulty matched recognition questions? 

It is questionable whether the intrinsic difference in difficulty 
between recall and recognition items can be established by any prevail- 
ing formula. Statistical correction is a crude device; one which 
betrays the minimum amount of guessing rather than the maximum 
degree. We assume the student is forced to guess, where he does 
guess, from among four choices. But, actually, a student meets 
alternative answers of unequal plausibility. Some choices are ‘‘dead 
timber” and even discriminating alternatives upon occasion can be 
rendered inert by a little knowledge. Thus a student, with some 
relevant information, may eliminate with assurance of no penalty 
perhaps one or even two of the alternatives. The odds of being correct 
are now equal and the standard correction formula, for the four-choice 
type of question, does not apply. It is evident that a more adequate 
correction device would reduce further the gap between the means of 
recall and recognition questions. Equally clear it is that psychologi- 
cally the gap can never be closed except as students master their 
courses. Recognition questions, in supplying cues and in presenting 
the correct answer, appeal to lower thresholds of response; intrinsically 
they are easier. Yet, as will be established later, correcting for guess- 
ing with its limitations yields advantages. 

If, as is often contended, a test securing a greater scattering of 
scores is to be preferred over one which yields lower variability, then 
the correction for guessing in recognition forms must be considered 
a gain. Variabilities in recognition sections of all examinations 
were markedly increased after the correction formula was applied. 
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Whereas, the recall sections yielded coefficients of variation frequently 
twice as large as those prevailing in recognition sections covering the 
same materials before correction, afterward the relative variabilities 
of recognition sections approximate those of recall sections. Since in 
only two contrasts, emphasized by asterisks in the right hand column 
of Table II, the relative variability of recognition exceeds that of 
recall after the correction formula is used, it seems as if recall tests 
possess a distinct advantage over recognition tests. 

The comparisons dealing with difficulty and variability have been 
based upon identical content which has been tested by two arrange- 
ments of recall and recognition tests. It has been assumed that the 
students who had the questions in one arrangement were comparable 
in knowledge and examination skill to those who had the questions in 
another arrangement. If the human factor is now kept constant and 
the course content varied, then similar results should indicate that 
differences in difficulty and variability are actually functions of the 
form of the test. Since the two examinations given in each of the 
three courses had equal numbers of both recall and recognition items, 
and since there were two forms of each examination, we may secure 
twelve contrasts. The scores for recall items in each form may be 
compared with the scores made in recognition items in the same form. 

The results of contrasting the two types of questions for varied 
content are so similar to those reported in Table II that it has not been 
considered essential to resort to tabular presentation. Differences 
between mean scores when recognition items are not corrected for 
guessing are equally large and, following correction, only two of the 
differences, one in the Educational and the other in the Social Psychol- 
ogy group, fail to exceed the acceptable criterion of three times their 
standard error. Difference ratios, always favoring recognition, are 
generally larger than those prevailing where identical content was 
under review, but the use of the formula for correlated measures is 
partially responsible. The relative variabilities behave similarly. 
For the uncorrected scores, recognition questions are from forty-seven 
to sixty-nine per cent as variable as recall questions and, after correc- 
tion, they range from sixty-six to one hundred per cent as variable as 
the recall items. Only two contrasts, however, indicate greater 
variability of the recognition questions, and again these fall within 
the Educational Psychology group. The conclusion that recall ques- 
tions are more difficult and more variable than recognition questions, 
when used in course examinations, seems well substantiated. 
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An assertion often met when discussions regarding the value of 
correcting recognition questions for guessing are in process is that, 
practically, the correction does not alter the grades received by stu- 
dents. As long as a total score in an examination is a composite of 
the part scores made in recall and recognition sections, the assertion 
is not quite true. Rank orders will not be identical in both sets of 
scores, for some students have done relatively better than others in 
recall items, but have achieved fewer correct items in recognition. 
Yet these rank changes are small, and the areas within which a score 
e can be labelled a grade of C,, for example, are so large that course grades 
i p will be altered infrequently and then at the boundary lines separating 

one grade level from another. This is not an unsupported hypothesis. 





i 

¥ Final grades have been determined by summating the examination 
ie scores in midterm and final examinations with and without correction 
& for guessing. Of necessity the computation in all instances involved 
es the use of identical percentages of letter grades. In the Social and 


Educational Psychology classes not one student would receive a differ- 
1 ent letter grade as a result of the correction; in the Abnormal Psychol- 
.. ogy classes the grades of six students changed. Each change occurs 
at the boundaries arbitrarily separating, for example, a high D grade 
from a low C grade. The six rank order correlations are all in excess 
of +.99. Had the tests not differentiated senior college students as 
well as they did, another outcome might have been found. Yet the 
argument that course grades will change but little, if the same per- 
centages govern letter-grade determinations, must be respected. 

Of more import is the inquiry into the disposition of letter grades 
when judgment is influenced by examining distributions derived from 
either corrected or uncorrected scores. Traditionally, certain stand- 
ards have been erected in the first two years of college. We assume, 
however, that in upper division or senior college classes a process of 
selection has favored us with a large proportion of capable students. 
What grades are assigned is often dependent upon the range of scores 
and ‘‘natural breaks” or gaps between scores. Uncorrected recog- 
nition scores are flattering to an instructor’s ego. They fall too often 
in the upper reaches of the distribution curve. But we have shown 
that ‘‘chance”’ factors in addition to knowledge have been represented 
in these scores. The students do not know as much as the distributions 
indicate. What is more important is that, relatively, the good stu- 
dents are markedly better in comparison with the poor students than 
our distributions represent. By dealing with uncorrected scores, we 
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undoubtedly give more high grades and fewer low grades than we would 
assign if faced with scores reduced in size by a correction. The greater 
variability resulting from estimating guessing, carries the changes in 
judgment beyond the doubtful indecisions attendant upon assigning 
grades to the same distributions on different days. One’s judgment is 
decisively qualified by the increased distance from the top to the low 
end of the distribution, and by the clearer separations achieved 
throughout the whole range of the scale. This contribution of correc- 
tion to the raising of academic standards seems of greater consequence 
than the changes produced in a few instances in boundary-line grades. 
It justifies the slight additional labor involved in reducing the total 
recognition score by one-third of the number of wrong items. 

Where the practice is to use relative grades for the motivation of 
scholarship, or for curtailing the arguments students sometimes try to 
urge upon an instructor for raising their grades, the use of the correc- 
tion for guessing is helpful. Students think spatially rather than 
statistically. They are more convinced of the justice of a failing grade 
when confronted with a score which is one-third that of the highest 
score rather than nearer to one-half of it. More successful students 
realize better the difference in accomplishment represented by any one 
letter grade, for the area included within the grade boundaries is 
greatly extended. Thus it appears that with selected or homogeneous 
groups the increased variability secured by the correction for guessing 
in recognition questions more than compensates for the slight amount 
of time needed to make the computations. 

An item analysis of all questions used in the three sets of final 
examinations supports the previous analyses in showing the greater 
difficulty of the recall form of question. For each course, the per- 
centage of students answering each question correctly was computed. 
A more refined mode of analysis was attempted, one similar to that 
reported in an earlier study,' but it must be admitted that techniques 
applicable with more unselected students break down where greater 
homogeneity of student ability is present. This result was not unex- 
pected, since the writers had already reported lowered validity or 
reliability with increasing homogeneity in a second quarter of a General 
Psychology course. Table III represents a larger number of students 
in each class answering correctly a greater proportion of the recognition 
questions. 

A brief elaboration of Table III may clarify the inequality of test 
items which express the same content in two different ways. . In all 
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arrangements of the final examinations there were five hundred ten 
recognition and five hundred ten recall items. It may be observed 
that of the recognition items one hundred nineteen or twenty-three per 
cent of the total were answered correctly by ninety-one per cent or 
more of the students, and that of the recall items only thirty or 5.9 per 


Tas_e II].—ITem ANALyYsis OF RECOGNITION AND REcALL Questions MaTcHED 
FoR IDENTITY OF CONTENT 











Percentage Abnormal Social Educational 
of students 
por tt Recognition | Recall | Recognition} Recall | Recognition) Recall 
—, N} N N N N N 
correctly 
96 22 0 17 1 28 11 
91 14 4 19 7 19 7 
86 20 11 17 7 18 13 
81 17 15 17 10 16 3 
76 17 11 20 7 7 6 
71 12 5 14 9 26 20 
66 13 10 14 12 18 8 
61 16 14 4 11 9 15 
56 7 10 10 8 8 10 
§1 7 12 3 10 7 6 
46 5 15 7 6 12 20 
41 3 8 5 10 7 8 
36 1 6 1 14 4 6 
31 3 10 4 10 4 9 
26 1 13 2 9 1 8 
21 1 8 5 6 3 16 
16 1 2 1 6 1 7 
11 2 6 1 5 
6 2 7 0 6 
1 sae 2 a 4 1 6 
160 160 160 160 190 190 























1N = Number of questions. 


cent fell within the command of an equal number of students. The 
following number and proportions of items were answered correctly 
by the accompanying percentages of students; recognition two hundred 
sixty-eight or fifty-three per cent answered by more than seventy-five 
per cent of the students in contrast with one hundred thirteen or 
twenty-two per cent for recall items; of recognition items fourteen or 
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2.7 per cent proved too difficult for at least seventy-five per cent of the 
students, whereas eighty-five or seventeen per cent of recall items 
baffled an equal number of students; and finally, only one recognition 
item, or .02 per cent was failed by more than ninety per cent of the 
classes against twenty-seven, or 5.3 per cent of the recall questions. 

Reference has already been made to the attitudinal differences 
assumed by students toward the two kinds of questions. There is little 
doubt, however, regarding the inclination of students to achieve about 
equally well in both parts of the same examination. Marked excep- 
tions are to be found. More often, than to the contrary, they show 
students making unusually high scores in the recognition items and 
mediocre but not the poorest scores in recall questions. Table IV 
affords evidence for the degree of association between these two kinds 
ofitems. Application of the formula for attenuation to the coefficients 
given first and second under each of the course names yields only one 
corrected coefficient of .95 and the remaining five are 1.00 or higher. 
The variability of the coefficients shown in Table IV indicates that 
although in each examination an attempt was made to cover com- 
parable subject-matter by both recognition and recall questions, and 
even though chance determined which of two paired items was placed in 
Form A or Form B, a great many variables must remain uncontrolled. 

But can we echo the phrase that the two forms of items are measur- 
ing the same thing when allowance is made for chance errors? Hardly! 
Leaving out the behavior differences manifested by students, we find 
that analysis of the relative achievement of students in the top quarter 
of the final examinations differs with respect to performance in recall 
and corrected recognition items from that of students in the lowest 
quarter. The best students tend to do equally well on both kinds of 
items. Had they done just as well, their scores in recall items would 
comprise fifty per cent of the examination total. Actually the separate 
scores range from forty to fifty per cent of the total score and the 
majority fall between forty-five and fifty per cent. In contrast are 
the recall scores of the lowest fourth of the total group. These scores 
range from twenty-five to fifty-five per cent of the total score, with 
only one score in excess of fifty per cent,and many less than forty per 
cent. The poorer students are more variable, but quite definitely is 
the recall type of question relatively more difficult for them than for 
their more successful classmates. 

We ask, finally, whether or not acquaintance with questions in the 
midterm tests expressed in one form will facilitate the answering of the 
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TaBLE 1V.—INTERCORRELATIONS FOR THE NEw-TYPs TEstTs IN THREE COURSES 
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Test Scores correlated Correlation! 
Abnormal Psychology 
PB). caine tildes ovi5 sand Recognition and recall .90 
Ein BUSS 5 cig pie tienen bale Recognition and recall .78 
OC OPPS POT PT ee Recognition and recall .72 
cae cde ns aesba owe Recognition and recall .64 
Mid-term A—Final A............ Recognition and recognition . 87 
Mid-term B—Final B............ Recognition and recognition .67 
Mid-term A—Final A............ Recall and recall .74 
Mid-term B—Final B............ Recall and recall . 84 
Mid-term A—Final A............ Total score and total score .72 
Mid-term B—Final B............ Total score and total score .64 
Social Psychology 
Be as 683.0. iss Sti s kas Recognition and recall .77 
aii 6 dio god weleda ds she een Recognition and recall .89 
IE Mo. nc a neice tne See nail Recognition and recall .69 
SEL Tl. :uins' 0 « o tinie 4-0 6:6 occa Recognition and recall .68 
Mid-term A—Final A............ Recognition and recognition .80 
Mid-term B—Final B............ Recognition and recognition .76 
Mid-term A—Final A............ Recall and recall .74 
Mid-term B—Final B............ Recall and recall .90 
Mid-term A—Final A............ Total score and total score 91 
Mid-term B—Final B............ Total score and total score 81 
Educational Psychology 
PE is Sein i ee his 5 Recognition and recall .87 
hs <ace ods pe's cans becdmpiens Recognition and recall .83 
IE ono at nonce aseed om Recognition and recall 45 
EE nis neeeese diurnal Recognition and recall .53 
Mid-term A—Final A............ Recognition and recognition .48 
Mid-term B—Final B............ Recognition and recognition .46 
Mid-term A—Final A............ Recall and recall .74 
Mid-term B—Final B............ Recall and recall .46 
Mid-term A—Final A............ Total score and total score .73 
Mid-term B—Final B............ Total score and total score .54 





1 Correlations are reported only on scores uncorrected for guessing since the 
differences between intercorrelations based on scores uncorrected and those 
corrected for guessing are not statistically significant. 
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questions expressed in another form in final examinations? A hind- 
rance to the economical use of new type examinations has been a fear 
of repeating items from one test to another. It has been assumed that 
students will remember the items for personal or group gain. How 
much is remembered, or at least how much appropriate information is 
utilized after a five-week interval, can be estimated. A distinct 
advantage may be gained by using tests early;in a course as teaching 
devices. If we desire students to gain facility with basic terminology 
and concepts, perhaps an examination is the most direct means for 
revealing to the student that which the instructor considers as most 
important. Then we might inquire into the type of question most 
likely to fix the basic ideas. Nothing more than a hypothesis can be 
established regarding this latter consideration, since these examinations 
were not discussed with the students. Any correction of misinforma- 
tion, or acquisition of necessary facts to reduce the lack of knowledge 
revealed by the examination questions, must have depended upon the 
student’s motivation. New-type tests have distinct teaching advan- 
tages which could not be utilized on this occasion because of the 
experiment. 

If all parts of examinations composed of recall questions were 
equally difficult, then the effect of acquaintance could be securely 
derived by contrasting the mean scores in midterm tests with the mean 
scores of both the final even and the final odd recall items taken sepa- 
rately. One would expect approximate identity of means for the 
midterm and the final odd items, because neither of these two groups 
of forty items could profit from previous acquaintance. If the mean 
score of the forty even-numbered recall items exceeds that of the other 
groups, the likelihood is that responsibility can be assessed to acquaint- 
ance with and remembering of some of the questions expressed in 
recognition form in the midterm test. But not all groups of forty 
items are of equal difficulty. This conclusion can be sustained by 
internal evidence. 

The means of the midterm recall items are always lower than the 
means of the final even recall questions. Of the four differences avail- 
able in this study, two yield difference ratios of 8.5 and 4.5, the former 
being located in Abnormal Psychology, Form A, and the latter in 
Social Psychology, Form A. The remaining two difference ratios are 
2.2 and 1.76. Now, it can be shown that where these differences are 
large, there will be found relatively easy recognition questions in the 
midterm tests, and where the differences are small the midterm recog- 
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nition questions were difficult. Examples may emphasize this fact. 
In examination Form A in Abnormal Psychology the mean for the 
even recall items in the final examination was 27.6 and the mean for 
the midterm recall was 20.4. The difference of 7.2 has a sigma of the 
difference of 0.85. Now these final even recall items had appeared in 
midterm tests as recognition items in Form A, on which occasion they 
proved much easier than the recognition items in Form B. Consider 
now the smallest difference between midterm and final even recall 
items, one which appeared in Social Psychology Form B. The mean 
for the midterm items was 20.6 and for final examination items 22.4. 
The difference is 1.8 + 1.43. These even recall items had previously 
appeared in midterm tests as recognition questions in Form B. They 
had proved to be difficult recognition items. Thus, for recall ques- 
tions, the amount of transfer from midterm to final examinations 
hinges partly upon the esse or difficulty of the midterm recognition 
questions. But, it should be noted, that the differences always favor 
the final even recall items; the magnitudes of the differences are not 
uniformly statistically significant. The means of the odd-numbered 
items in the final examinations are always more like those of the mid- 
term means than they are like the means of even-numbered items in 
the final tests. 

The recognition questions have been subjected to a similar type of 
analysis. Two differences between midterm recognition and final 
examination recognition items, Abnormal Form B and Social Form B, 
are sufficiently large to have difference ratios of 5.9 and 4.8, respec- 
tively. In both cases, these recognition questions were the easier 
recall questions in the midterm tests. In one contrast no difference 
existed, Abnormal Form A, and in the other, Social Form A, a negative 
transfer seems operating. Again, as recall questions they were the 
more difficult. It seems most likely, therefore, that acquaintance with 
recognition items in the midterm tests will render aid to a greater 
extent in the answering of recall items covering identical content than 
will contact with recall questions. Likewise, it appears established, 
that easy midterm tests, whether of recognition or recall type, transfer 
their effects to final examinations more than difficult tests. The data 
suggest the use of recognition tests early in a course for teaching pur- 
poses and the use of recall tests for the measurement of achievement. 


SUMMARY AND CONCLUSIONS 


This study of the comparative reliability and relative difficulty of 
recognition and recall questions based on identical content is unique 
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because the questions were administered as an intrinsic part of college 
courses. 

The subjects were students enrolled in three advanced courses; 
namely, Abnormal Psychology, Social Psychology, and Educational 
Psychology. All students were given two tests, a midterm examina- 
tion and a final examination. These examinations included recall 
questions requiring the completion of a sentence by a single appropriate 
word, and single-choice recognition questions, in which one of four 
alternative answers was correct. In each examination, there was 
always the same number of recognition and recall questions. Identity 
of content for recognition and recall questions was secured by reversing 
the form of the questions in an alternate examination form. Thus, 
in each of the three courses, the recognition questions of Form A, 
either midterm or final examinations, became the recall questions of 
Form B and vice versa. There were, therefore, twelve examinations: 
Two midterm and two final examinations for the one hundred eighty- 
five students in Abnormal Psychology, and the same number of exami- 
nations for the eighty-five students in Social Psychology and the 
forty-eight students in Educational Psychology. The two groups of 
students in each class were matched, it is to be hoped, either by a 
random selection or by achievement in courses in psychology already 
completed. Each midterm examination in the Abnormal and Social 
Psychology courses consisted of eighty items, forty recognition ques- 
tions and forty recall questions; whereas the midterm examinations 
in Educational Psychology were made up of fifty-four items, twenty- 
seven recognition questions and twenty-seven recall items. The final 
examinations in Social Psychology and Abnormal Psychology included 
one hundred sixty items each and the final examinations in Educational 
Psychology, one hundred ninety questions. 

The major points of this article are set forth in the following short 
paragraphs: . 

(1) Coefficients of reliability, based upon the correlation of odd 
and even items, show that questions of the recall type are generally 
more reliable than those of the recognition type. 

(2) The correction of scores on recognition tests, undertaken to 
offset the effects of guessing, decreases the averages and increases the 
variability of scores. It does not, however, exert any uniform effect 
on the reliability of the tests; some coefficients are raised and others are 
lowered. 

(3) Recall questions are both more difficult and more variable than 
recognition questions based on ‘identical or on different content. 
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Application of the correction formula to recognition questions neces- 
sarily reduces discrepancies between means, but it leaves differences 
ordinarily considered significant. 

(4) An empirical check was made to ascertain if the correction of 
recognition scores would change course grades. Only six students, 
out of a total of three hundred sixteen students, would have received 
a change of one letter grade. All reversals occurred at the boundaries 
separating one grade from another. Since correction of recognition 
scores increases their variability, it is plausible to assume that the 
instructor’s task would be simplified in assigning grades to a selected 
group of students and that different percentages of letter grades would 
be given. Likewise, students might be motivated to work more 
diligently could they be made to realize the greater discrepancies in 
achievement resulting from applying the correction formula. 

(5) An item analysis of the five hundred ten recognition and five 
hundred ten recall items based upon identical content also indicates 
the greater difficulty for recall questions. Student reports substantiate 
unquestionably the conclusion based upon difference ratios and item 
analyses. 

(6) A comparison of students in the top quarter of the final exami- 
nations with that of students in the lowest quarter, with respect to 
relative performance in recall and recognition items, shows that the 
poorer students are not only more variable, but that they find the recall 
type of question more difficult than do the superior students. 

(7) To test the effect of acquaintance with questions in the mid- 
term examinations stated in one form upon the answering of these 
questions expressed in another form in the final examinations, all 
questions in the Social Psychology and Abnormal Psychology midterm 
examinations were repeated. The results show that recognition items, 
encountered in the midterm tests, can be expected to aid students to a 
greater extent in the answering of recall items covering identical con- 
tent than will acquaintance with recall questions. Likewise easy 
midterm recognition or recall questions seem to transfer their effects 
to final examination proficiency more than difficult questions. These 
data suggest the value of recognition tests for the teaching of basic 


terminology and the use of recall tests in the measurement of final 
achievement. 
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A VALIDATION OF THE LOOFBOUROW-KEYS 
PERSONAL INDEX OF PROBLEM BEHAVIOR IN 
JUNIOR HIGH SCHOOLS 


WINIFRED C. RIGGS AND ARNOLD E. JOYAL 


University of Denver 


G. C. Loofbourow, in a study entitled The Validation of Test 
Materials for Problem Behavior Tendencies in Junior High School Boys, 
attempted to develop a set of test materials designed to prognosticate 
behavior tendencies in boys in the seventh, eighth, and ninth grades. 
In order to do this, he collected all known tests which purported to 
measure problem behavior tendencies. By careful validation on 
six hundred twelve problem-behavior boys, most of whom lived in the 
San Francisco Bay area, he was successful in developing a set of test 
materials which proved to be, at least in his experimental situation, a 
valid measuring instrument. He says in his study, “‘The chances 
appear to be ninety in one hundred that a pupil making such a score 
(a so-called ‘critical score’) belongs in the problem classification.’’5 

Following the publication of his study in 1932, Loofbourow, with 
the assistance of Noel Keys, of the Department of Education at the 
University of California, continued to work on the problem of refining 
the tests. He was especially interested in abbreviating the materials 
so that they could be used in the practical school situation. Over a 
period of two or three years he and Keys experimented with materials, 
and in 1934 they developed the Loofbourow-Keys Personal Index, which 
is published by the Educational Test Bureau. This index, designed 
to be administered in forty minutes, purports to indicate, with 
a high degree of reliability and reasonable validity, the potentially 
bad boys in a junior-high-school situation. Boys who receive a score 
greater than the critical score of 40 on the test are likely to be problem- 
behavior cases. Results of the application of the test in a number of 
situations seem to indicate that it is a fairly good measuring instrument. 

It is frequently found to be true, however, that tests of this kind, 
developed and validated in an experimental situation, fall down when 
applied to an entirely different group of children in a completely new 
setting. The purpose of the authors of this study, therefore, was to 
check the validity of the Personal Index for a representative group of 
boys in Baker Junior High School in Denver, Colorado. The criterion 
of validity to be used was the ratings of the boys’ advisor of the school, 
made at the end of the subjects’ junior-high-school career. 
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Baker Junior High appeared to be a particularly good school in 
which to attempt to validate the index, since the school is located in 
that part of the city which, generally speaking, produces the largest 
amount of delinquency, and a considerable number of problem-behavior 
cases. The number of boys sent from Baker Junior High to the State 
Reform School at Golden, Colorado, is relatively large. The school 
has approximately eight hundred fifty pupils and twenty-eight teachers, 
and includes only the seventh and eighth grades. There are, on the 
average, about one hundred to one hundred twenty-five boys who enter 
as beginning seventh-graders each semester. 

Early in 1934 a group of boys who entered the school in January 
of that year were tested. A second group of boys entered in September, 
1934, and were tested that fall. The total number tested was one 
hundred eighty-six, all of whom were seventh-graders. This number 
constituted all the boys who were present on the days selected for 
administering the Index. The boys’ advisor in the school* had no 
knowledge of the scores made on this Index. At the time of administer- 
ing the index he had not even seen a copy of it. After the test was 
administered, the authors explained the purpose of the study. 
Although he had had little opportunity to become acquainted with 
the boys involved, the advisor was asked to classify the one hundred 
eighty-six boys in this study in terms of their behavior as he knew it 
then. The authors requested him to select the fifteen to twenty boys 
whom he believed to be the very worst problem cases. He was asked 
to select another fifteen to twenty bad boys. He was asked to write 
down the names of the fifteen to twenty very best behaved boys in the 
school and, also, another fifteen to twenty good boys, who, while not 
among the very best, were next to the best. The advisor made these 
selections and, doubtless, they reflected to a considerable extent a true 
picture of the extremes of bad behavior and good behavior in the 
seventh grade at that time. Two years later, when these one hundred 
eighty-six boys were about to finish the eighth grade in 1936-1937, the 
advisor, who had not seen the scores in the meantime, was asked to 
rate the boys in terms of problem behavior. He was asked again to 
pick out about fifteen of the very worst boys, about fifteen bad boys, 
about fifteen good boys, and about fifteen very good boys. The selec- 
tion was based on the records kept in the advisor’s office, and probably 
reflected quite accurately the grouping of the boys based on their actual 
behavior in school as he knew it. This final selection of very good, good, 





*Tvan B. McClure. 
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bad, and very -bad boys, forms the criterion against which the final 
coefficients of validity were computed for the Loofbourow-Keys 
Personal Index for the purposes of this study. 

Having obtained these ratings, and knowing the scores of the boys, 
it became necessary for the authors to compute the degree of relation- 
ship between the score made by each boy on the test and his rating 
by the advisor. Table I presents the final ratings made by the advisor 
after two years had elapsed. Back of these ratings was two years of 
contact with the boys, not only on the part of the advisor, but also 
on the part of the teachers. The following records were used in rating 
the boys: Their attendance, tardiness, the number of times they had 
been sent to the office, and the reasons therefor. Consequently, each 
rating is a summary of the boy’s record in the school as to codéperation, 
adjustment to the social groups, antisocial traits, and ability to fit into 
life as he met it in school. These ratings were used for the criterion 
of validity. 

It is not possible with such data as these to employ the usual meth- 
ods of computing coefficients of correlation. The Pearson’s Product- 
moment Formula may be used only when the two variables are 
distributed over a quantitative scale. Fortunately, however, there is a 
method, biserial r, which may be employed, although it is necessary to 
adapt the data in this particular study to the formula. 

Actually one hundred eighty-six boys were used in this study. The 
four groups of boys, as selected at the beginning of the two-year period 
in 1934, consisted of fifteen ‘“‘best,”’ nineteen ‘‘good” boys, nineteen 
“bad” boys, and twenty ‘‘worst’”’ boys. The fifteen “best” boys 
made scores on the index of 41, 40, 35, 32, etc., as indicated in column 1 
of Table I. The “good” boys made scores of 74, 69, 53, 52, etc., as 
indicated in column 3 of Table I. Scores for the “‘bad” boys and 
‘“‘worst”’ boys are also listed in the same table. 

It will be noted in Table II that the mean index score for the group 
of problem boys was 53.3. The mean of the non-problem group was 
34.8. The difference between these arithmetic means was 18.5. 
The coefficient of biserial correlation was +.58 + .04 (7, equals 
+.58 + .04) between the ratings of the boys’ advisor and the scores 
made on the index. It is emphasized that this is the relationship 
between the initial rating of the boys at the time when they had first 
entered the junior high school in 1934. 

Columns 2, 4, 6, and 8 of Table I present the scores made by the 
boys selected for each of the four groups at the end of the two-year 
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period in 1936. The extreme right-hand column of Table II presents 
the corresponding weighted counts, mean index scores, and coefficients 
of biserial correlation for this second grouping of the boys which was 
made at the end of the junior-high-school period in 1936. It will be 
noted that at the end of the period the coefficient of biserial correlation 


TaBLE I.—Scorges oN THE Loorsourow-Keys Personat InpEx FOR Four 
Groups oF Boys as SELECTED BY THE Boys’ Apvisor In 1934 aNpD AGAIN 
IN 1936, Baker Junior Hieu Scuoon, DENvER 


























‘*Best”’ boys **Good”’ boys “Bad” boys **Worst”’ boys 
In 1934 In 1936 | In 1934 | In 1936 | In 1934 | In 1936 | In 1934 | In 1936 
(1) (2) (3) (4) (5) (6) (7) (8) 
41 86 74 84 86 91 108 108 
40 60 69 69 79 86 96 92 
35 50 53 55 72 78 77 79 
32 41 52 54 65 64 69 77 
31 35 51 52 61 59 64 64 
30 26 48 52 59 56 64 59 
29 22 38 49 53 54 64 53 
19 19 36 49 52 53 59 48 
19 19 35 49 45 52 54 47 
17 18 35 46 41 52 53 43 
17 17 35 41 39 50 52 43 
16 14 32 36 36 45 49 40 
14 13 28 34 34 43 48 50 
13 12 28 28 34 34 43 37 
12 “s 25 27 33 33 43 32 

24 27 32 30 40 32 
22 24 32 28 40 20 
15 22 28 19 32 
13 19 27 ae 23 
19 : 











was +.48 + .04, which is approximately ten points lower than the 
correlation obtained at the initiation of the study. 

This is not a very high degree of positive relationship. It is not as 
high a degree as was obtained by Loofbourow in the original studies. 
A coefficient of +.48 means that there is a fair amount of positive 
relationship, but the relationship is not sufficiently great to assure any 
consistency of predictions. Perhaps not over one boy in three thus 
indicated is likely to become a problem-behavior boy. 
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The authors of this Personal Index have indicated that they believe 
that when Part III of the index is omitted, the test is more valid. 
Part III of the test had been borrowed from Hartshorne and May, who 
held that boys who lay claims to excessive virtues on «uch matters as 
indicated in the test, are more serious problem-behavior cases than 
boys who charge themselves with many shortcomings. in order to 
check up on this hypothesis, the authors rescored all the tests, omit. :ng 
Part III, and then recomputed the whole problem. 

When this was done for the initial rating the mean index scores on 
the test were found to be 41.4 for the problem group and 25.0 for the 
non-problem group—a difference of 16.4. For the final rating the mean 
index scores were 40.2 for the problem group, and 24.6 for the non- 
problem group—a difference of 15.6. The biserial correlations between 
the scores and the ratings were found to be +.62 and +.52, which are 
slightly higher than the coefficient previously obtained. It would 
seem, therefore, that it is better to omit Part III. 

TaBLE II.—Megan Inpex Scorgs AND BISERIAL CORRELATIONS BETWEEN INDEX 


ScorEs AND ProBLEM Bexavior Ratincs Maps at BEGINNING AND END 
OF TWO-YEAR JUNIOR-HIGH-SCHOOL PERIOD 

















Initial rating | Final rating 
stom 1934 1936 
Number of boys (actual count).................. 186 186 
Number of boys (weighted count) 
SESE ES SE RE Pe ee 58 53 
in cece bhebcnbed¥ecanes 83 80 
I eR ee ME alla. Vibe oy 0 4 a Wiss Ob hs 141 133 
Mean index scores on tests 1, 2, 3, 4. 
ee osm aebaebee oo evecest' 53.3 51.8 
I ian s Swig Pe weat as canescens 34.8 35.1 
Near 2 ee rye ines. rye ET 18.5 16.7 
Mean index scores on tests 1, 2, and 4 (test 3 
omitted). 
I. as va Ls dine Malte Ua A NSE ee 41.4 40.2 
ER IN aw cn nghdanee id neaceads 25.0 24.6 
I BA's 5 a:citce Xda ae 4 bene ee kak dead 16.4 15.6 
Biserial correlations between Personal Index scores 
and ratings. 
NE RD Be iii s ans WR dieb Kisrate OSG H 8K oe oe +.58+ .04 + .48+ .04 
Tests 1, 2, and 4 (test 3 omitted).............. +.62+.04 | +.52+.04 
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In terms of these findings neither of these coefficients is sufficiently 
high to warrant any very great degree of confidence in the Loofbourow- 
Keys Personal Index, at least not when administered in the conditions 
set up for this experiment. Doubtless the index might be helpful in 
indicating students who are seriously maladjusted, in that an exces- 
sively high score on the test would indicate that something is wrong 
with the boy and that he needs to be closely supervised. Of course, 
as will be mentioned later, the fault may rest, not with the test at all, 
but with the criterion. 

Another purpose behind this study was to determine whether or 
not the score of 40, mentioned by the authors of the test as the “‘ critical 
score,’ was actually the point above which problem-behavior cases 
were likely to be found. By studying the evidence accumulated in 
this study, it is apparent that when the whole test is used the score 40 
is much too low. Forty-four per cent of all the pupils who were given 
the test received a score of 40 or above. It is not thinkable that such 
a large proportion of the boys are problem-behavior cases. A score 
of 55 is much more defensible as the critical score. While there are 
no data which will specifically substantiate the selection of this score, 
this point is indicated as being a better figure than 40. When Part III 
of the test is omitted, and this study shows that it does not contribute 
to the validity of the test, the score 40 would appear to be more nearly 
correct, although in general the boys indicated as problem-behavior 
cases received scores which exceeded 35 on three parts of the test. 

One should not immediately jump to the conclusion, however, that 
this particular study necessarily invalidates the results previously 
obtained by the authors of the Index. All that can be said is that 
under these particular conditions, for the time allowed in the study of 
the boys involved, assuming that the ratings of the advisor are perfect 
indications of true problem behavior, the index is not very valid. 
However, the criterion may be wrong, and, in the opinion of the writers, 
such is probably a reasonable explanation. 

A boys’ advisor in junior high school may tend to rate as most 
serious problem-behavior cases those extrovert boys who cause him 
and the teachers most trouble. It is the boy who makes the most 
noise, causes the most disturbance, most often plays truant, and 
similarly gives evidence of overt action, who is to him the most serious 
problem-behavior case. As a matter of fact, these boys may be good 
boys. ‘Time often shows that the most serious behavior cases in adults 
are the quiet, retiring characters who behaved reasonably well in 
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school, but who, behind the backs of the teachers and the advisor, and 
completely unknown to them, were seriously maladjusted. Such 
persons fall in the classification of introverts, and psychiatrists tell us 
that the introverts are the ones who more often cause trouble than the ] 
extroverts. 

It is especially to be noted that the writers do not state that the 
index is invalid. Such a statement should not be made, if ever, until a 
long enough time has passed clearly to indicate which ones of the boys, 
over a long period of years, have proved to be the problem-behavior 
cases. A serious drawback of this study was the inability to follow 
the boys over a longer period of time. Crowded conditions in the 
building made it necessary to send the boys to high school at the end ' 
of the second year instead of the third year normally belonging to the 
junior-high-school period. Further, the economic conditions of the ( 
particular attendance district take large numbers of the boys from { 
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the schools at the end of the eighth grade. They are lost in the life 
of the city, or they move away from the follow-up reach of the school. 
i If ten years from now a check-up could be made of these same boys, 
i © one might find that many of those who made high scores on the index, 
a but who were not rated by the school as among the problem boys, 
* © might be in difficulty of one kind or another. Likewise, it might be as 
{ | true that some of the boys, rated as problem-behavior cases, and 
ee receiving low index scores, might be among the good citizens, despite 
; the fact that in junior-high-school days they were boisterous and 
ih thoughtless individuals. ‘ 
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SUMMARY 


This study has presented the results of an attempt to validate the 
Loofbourow-Keys Personal Index in a junior high school in Denver, 
Colorado. The Personal Index is a forty-minute test which purports 
to select with considerable reliability and reasonable validity, the 
potential problem-behavior cases among boys of junior-high-school 
age. The Personal Index is based on research conducted in the San 
Francisco Bay area by G. C. Loofbourow, of the Fresno Public Schools, 
and Noel Keys, of the University of California. 
The subjects of this study were one hundred eighty-six boys who 
; ie constituted the seventh-grade class entering Baker Junior High School 
a i in 1934. After the test was administered and scored, the boys’ 
i ie advisor in the school was asked twice to select the “best” and the 
: i “worst” boys in the school, in terms of problem behavior. He made 
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his first selection shortly after the boys entered the school and, without 
seeing the scores in the meantime, made a second selection approxi- 
mately two years later. had 
) Using as a criterion of validity the selection of problem and non- ; 
problem boys made by the boys’ advisor for coefficients of biserial | 
correlation between the ratings of the advisor and the scores on the last 
test were computed. The results indicated that there was a coefficient 
of correlation of .58 between the scores and the initial rating, and of .48 
between the scores and the final ratings. : 
This is not as high a degree of relationship as was obtained by the | 
authors, and may be explained in several ways. Perhaps the criterion 
used is not a truly valid one. Possibly the boys’ advisor rates as 
most serious overt action and extrovert tendencies which result in 
causing the boy to display trouble in junior high school, whereas, it is 
the quiet introvert type which is most likely to be the serious problem- 
behavior case. Several other possible explanations are suggested in 
the study. 
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THE EFFECT OF MOTIVATION ON CHANGES IN 
VARIABILITY DURING PRACTICE! 


ZED H. BURNS 
Appalachian State Teachers College, Boone, N. C. 


THE PROBLEM 


The psychological literature dealing with the effects of practice 
upon changes in variability is extensive, and yet nowhere in it is to 
be found any explanation of how motivation acts upon the variability 
of a group during practice. 

This is the problem which this experiment attacks. In different 
words, given a group to be practiced and granting that motivation is 
effective, who is most affected by the motivation, those who stand 
initially high or those who are lowest to begin with? Or, on the other 
hand, does motivation affect most those who fall in between the two 
extremes? 

If there were in existence definite and reliable knowledge as to 
what happens to a group—any group—when practiced in a given func- 
tion, be it mental arithmetic or tossing the yo-yo; and further, if there 
existed any clue to what motivation is, or should be, as regards such a 
practice group, the problem raised would be a fairly simple one. 
Unfortunately, this is far from being the case. It seems doubtful 
whether there is any other topic of psychological research where so 
much experimentation has gone forward for almost thirty years with 
so few conclusive results. The fact is that today no one knows with 
certainty what the effect of practice is upon the variability of a group, 
or, to vary the wording, how training affects individual differences. 
Far less is there any agreement as to what conditions constitute 
adequate or sufficient motivation in such an experiment. There is 
no need to go into this question here as it has been recently treated 
elsewhere.? : 

‘Therefore, the problem which this study attacks is not a simple 
question of what part motivation plays under known conditions. 
What happens to the variability of a group during practice when 


1 This article is an adaptation of material reported in full in the author’s 
doctoral study at the Teachers College of the University of Cincinnati. Grateful 
acknowledgment is herewith made to the many individuals and groups who aided 
in making this investigation possible. 

* Burns, Zed H.: “Practice, Variability and Motivation.” Journal of Educa- 
tional Research, February, 1937, pp. 403-420. 
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practice takes place under normal conditions, assuming normal condi- 
tions to be the sort of motivation which other investigators who have 
studied the problem have used? Further, how is the factor of motiva- 
tion to be isolated so that it may be studied? There seems to be 
only one chance of doing this. This is to perform a practice experi- 
ment with two groups of subjects as nearly alike in all respects as 
possible, and to vary the procedure with the two groups in such a 
way 2s to motivate one of them normally and the other to a high degree. 
At the end of practice, if it be found that the highly motivated group 
has departed from the normally motivated group in terms of varia- 
bility, this departure may properly be attributed to the differential 
effect of the high motivation. This is what the present experiment 
did. 


PROCEDURE 


In order to obtain the data upon which the conclusions of this 
report are based, it was necessary to administer, score, and record some 
six thousand individual tests. These tests were distributed over a 
period of six weeks. About the same length of time was consumed 
in the preliminary part of the investigation. This consisted of 
collecting psychological and English scores of the subjects, developing 
code writing and addition testing materials, and giving preliminary 
tests used in equating the two groups. 

There were available for the subjects of this investigation, the 
scores made by them on the Thurstone Psychological Examination for 
high-school graduates and college freshmen, 1931 edition, and also the 
scores made by them on the Columbia Research Bureau English Test, 
Form B, for upper high-school grades and colleges. Preliminary tests 
in the code writing and the addition were given. The results of these 
four tests were used as a basis for forming the two equal groups neces- 
sary to carry out the investigation. 

In equating the two groups, one of the methods suggested by 
Garrett! was followed. The standard deviation for each of the four 
tests was calculated and found to be as follows: Thurstone 47.50, 
English 26.37, code writing 13.32, and addition 5.37. In order to 
make all the sigmas approximately equal (as there seemed to be no 
good reason for giving one test more importance than another) the 
sigma of the Thurstone Examination was multiplied by seven-twelfths, 





1 Garrett, Henry E.: Statistics in Psychology and Education. New York: 
Longmans, Green and Co., 1926, pp. 279-281. 
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the English by one, the code writing by two, and the addition by five. 
These multipliers gave the following new values for sigma: Thurstone 
27.72, English 26.37, code writing 26.54, and addition 26.85. The 
actual scores of each of the eighty-three subjects were then multiplied 
by these same multipliers and the products added to obtain a single 
composite score for each subject. These scores were then arranged in 
descending order. Beginning at the bottom of the list, the first two 
scores were combined to form the first pair, the next two scores to 
form the second pair, and so on through the whole list until forty-one 
pairs had been formed. Of the forty-one pairs, one member was 
placed in the normally motivated or control group and the other in 
the highly motivated or experimental group. 

Between the time that the forty-one pairs described above were 
formed and the actual beginning of the experiment, ten late comers 
were added to the two classes whose complete personnel this experi- 
ment included. As the practice periods had been assigned as a part 
of the regular work in psychology for the two classes before mentioned, 
no one in either class could be omitted from the experiment. Further- 
more, it seemed desirable to have as many subjects as possible, and it 
was hoped that some of these ten new subjects could be used to fill the 
places of any of those who might, for one cause or another, be required 
to drop out. This proved to be the case, and five of these ten were 
used in the experiment. Two pairs of matched subjects were added 
to the original forty-one pairs. This investigation is based upon the 
data secured from these forty-three pairs of subjects. 

This method of forming the two groups made them as much alike 
as possible. A comparison of their means shows that of the control 
group to be 288.7 and the mean of the motivated group to be 287.3. 
The standard deviation of the control group is 79.7, and that of the 
highly motivated group is 79.1. This gave two groups with almost 
exactly the same index of variability. Calculation shows it to be 
.276 for the control group and .275 for the experimental group. 

The subjects of this experiment were given credit for participating 
in the experiment as a substitution for their regular laboratory work 
in psychology. ‘Thus the experiment became for them a part of their 
regular work. This arrangement solved one of the most pressing 
problems of the investigation; t.e., the problem of how motivation was 
to be secured for the control group. 

Two charts or graphs were provided so that the progress of the 
motivated group in each material could be shown each day in terms of 
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average scores. A list of the names of the members of the motivated 
group, together with their scores in both materials, was posted on the 
bulletin board each day between the two graphs. 

The code writing material consisted of mimeographed sheets of 
English which were to be put into the old Civil War code as rapidly as 
possible. The English selections were from Bacon’s Essays, this 
source being chosen because of the general uniformity of the work 
and hence the evenness of difficulty. The key to the code writing is 
given herewith: 
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The symbol for each letter consists of that portion of the figure in 
which the letter occurs. For example, A is written thus _| and B 
thus| |. In the second group the letters are differentiated from those 
in the first one by placing a dot in the symbol; and in the third group 
two dots are used. For example, A, J, and S are all written alike 
except that the first is written without a dot, the second has one dot, 
and the third has two dots. Thirty different specimen sheets were 
prepared in the code writing and a different one was used every day. 
The score in code writing was always the number of letters correctly 
translated into the symbols in five minutes. 

The addition material consisted of mimeographed sheets of columns 
of numbers. Each column was made up of ten single digits (the 1’s 
and 0’s were omitted). Each sheet contained three rows of these 
columns, twenty-one to the row. Ten different sheets were prepared 
and rotated over a period of ten practices. The score in addition was 
always the number of columns added correctly in five minutes. 

All experiments for both groups were administered in the same 
laboratory, which had appropriate facilities for good working condi- 
tions. The students who took part in this experiment were told that 
the practice work which they were to do was a part of their regular 
course and that they were expected to carry out instructions and 
coéperate in every way, just as in any other part of their work in 
psychology. This was explained to them by their regular instructor. 
At the same time, they were reminded by him that the essence of an 
experiment is the controlling of conditions under which it takes place 
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and that, therefore, they were not to practice the various tasks which 
the experiment involved, except at the regular experiment periods. 
He also warned them that failure to comply with this request was 
easily detected by the person conducting the experiment. The 
subjects were told enough about the aim of the experiment to enable 
them to follow instructions intelligently. That is, they were told that 
the experiment itself was an investigation in the field of learning and 
that both groups would not follow identically the same procedure. 
In the explanation of the experiment, the point was emphasized that 
the students were to receive credit, not for how much of each material 
they were able to cover in a certain time, but for how well they carried 
out whatever instructions were given them. It was made plain to 
them that differences in scores were to be expected, and that a low 
score was no reflection on its maker, while a high score, though desira- 
ble, was evidence of individual ability along a certain line rather than 
of superior intelligence, and that, in any event, the thing of importance 
was to see how much each person could gain each day, regardless of 
what his original score was. 

A part of the regular chapel period was used as the meeting time 
for the control group which met, therefore, every morning at 10:15. 
Charts of the seating arrangement of the room were drawn up. These 
charts permitted the checking of attendance while the tests were in 
progress. Code writing and addition materials were distributed each 
day before the laboratory door was opened and the students allowed 
tocomein. This material was laid face down on the arms of the chairs. 
At 10:15 the doors were opened and the subjects were permitted to 
enter the laboratory and take their places. The subjects turned over 
the two practice sheets and wrote their names on them, together with 
the date, and then turned them wrong side up. When all the subjects 
were ready to begin, they would be told, ‘‘When the clock moves the 
next time, we will begin.”” When the clock made the click character- 
istic of electric clocks, they were told to go. At the word “go,” all 
would turn their papers over and begin on either the code writing or 
the addition, depending on the instruction for that particular day. 
A notice telling the subjects which material to do first each day was 
kept on the blackboard at the front of the room, along with the date. 
This was changed each day, so that the code writing which came first 
the first day, came second the second day, etc., throughout the whole 
six weeks of practice. At the end of five minutes the subjects were 
told, ‘‘ Begin now on the addition,” or, ‘‘Begin now on the code writ- 
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ing,’ depending on which material had been used to start with. The 
subjects would stop work only long enough to change from one mate- 
rial to the other, never more than a few seconds. After five more 
minutes had elapsed, they were told, “Time up.” At the words 
“time up,” the subjects would stop at once, no matter where they 
happened to be, and turn their papers over as they had been instructed 
todo. They had been warned beforehand that they were not to count 
their scores from day to day, or to try to keep up with what they made 
in any way. The practice sheets were always left on the arms of the 
chairs and collected at once, placed in envelopes, and labeled. 

- The daily procedure for the motivated group was exactly the same 
as that described for the control group, with a few additions. No 
attempt was made to keep the subjects of this group from counting 
their scores immediately after the tests each day. Two large charts 
were posted each day before the members of this group came into the 
room. These charts showed the position of the group in each material 
as shown by the average score. In addition to these charts, the scores 
of all subjects for the previous day in both materials were typed and 
posted on the bulletin board each day, with the highest score and the 
corresponding name first, and the second highest next, on down to the 
lowest. When they came to take their tests, the subjects of this group 
were allowed a few minutes each day to look at their scores for the 
previous day. The time for the meeting for the motivated group was 
3:30 in the afternoon. 

The practices were held every school-day (five days a week) over a 
period of six weeks. The time proved to be well chosen in that during 
the whole practice time of the experiment there was not a single holiday 
to mar the continuity of the work. Whenever an absence occurred, 
the absentee was required to come in and make it up as soon as possible 
by appointment. The test papers for both groups were scored each 
day and the scores entered on a large chart which was prepared in such 
a way as to provide a place for each score in each material every day 
for every subject. A brief daily record of the routine procedure of the 
experiment which was kept showed absences to be less than two per 
cent. 


RESULTS 


Before passing on to the results of this experiment, a word 
should be said about the statistical measures used in handling those 
results. 
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It would be unprofitable and unwise to enter into a discussion here 
of the relative merits of the various measures of variability suggested 
by previous investigators of the problem of the effect of practice on 
changes in variability.! 

The most significant point of difference is the question of whether 
absolute or relative measures shall be used for determining progress. 
After a careful review of the literature on this point, the writer cast 
his lot in with those who favor relative measures, among others Reed 
and Peterson. 

The four measures chosen are given herewith: (1) The index of 
variability as measured by the formula SD/Average; (2) The index of 


90P — 10P } 
median ”’ (3) The ratio of 


the average of the initially three lowest to the initially three highest 
at the beginning of practice as compared with the ratios of the averages 
of the same individuals at the end of practice; (4) The correlation 
between initial ability and relative gain. 

Attention has been called earlier to the differences between the 
conditions of motivation under which the control and motivated groups 
worked. It is the purpose here to show that the special motivation 
given the experimental group did produce a differential effect upon 
achievement. 

Figures 1 and 2 are graphic records of the progress made from day 
to day in achievement, as measured in terms of the daily average of 
the group, by both groups in code writing and in addition. These 
records show that, although the motivated group made a lower score 
in both cases to start with, the average score of that group in code 
writing caught up with and exceeded that of the control group for 
the sixth practice. The same thing occurred in addition for the ninth 
practice. From the sixth practice in code writing, the motivated 
group maintained its lead through the thirtieth meeting. From the 
ninth practice in addition, the motivated group held its lead through 
the thirtieth practice in spite of the irregularity at times in the progress 
of both groups. | 

In interpreting Figures 1 and 2, it should be remembered that what 
is shown is not the difference in daily progress between a motivated 





variability as measured by the formula 





1 The reader who is interested in following these arguments is referred to the 
bibliography contained in the article by the writer already referred to, and his 
attention is especially directed to articles by Reed, Chapman, Anastasi, and 
Peterson who have given excellent treatments of this question. 
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group and a non-motivated group, but the difference in progress 
between a control group rather well motivated and an unusually well- 
motivated experimental group. All subjects in both groups had 
credit to gain in psychology by complying with the conditions of the 
experiment; the main one of these conditions was to try to improve 
their scores as much as possible each day. This was a strong motivat- 
ing factor. In addition to this, the experimental group was motivated 
by a knowledge of results and competition. What is shown in Figures 
1 and 2, therefore, is a graphic representation of the effect of a strong 


TaBLE I.—SuMMARY OF DIFFERENCES IN CHANGES IN VARIABILITY FOR THE 
CoNnTROL AND MotivaTep Groups IN Cops WRITING 






































Practice Amount of change Sites 
Measure Group duane 
Initial | Final | Increase | Decrease 
SD Control........ .42 ere .21 | Control 
Average | Motivated...... 44 . were 17 
90P — 10P | Control........ 1.10 . fear 44 
median | Motivated...... 1.22 a Sere .50 | Motivated 
Av. 3. high | Control........ 5.68 Pe ite seawe 4.81 | Control 
Av. 3 low | Motivated...... 7.07 ns: Seve tems 4.73 
Correlation of initial score and per cent of gain 
ee cil. eeu Cae ssabeb an armada dea oa — .69 | Control 
ee Ee ee ie a kak hee hee — .58 








motivation as compared with normal motivation in code writing and 
addition, in terms of the daily averages of the groups. 

The data presented herewith show that certain differences occurred 
in variability between the control and motivated groups in code writ- 
ing. As these groups had been carefully equated, and all conditions 
as far as possible controlled during the experiment to a very high 
degree, it seems legitimate to attribute this difference in change in 
variability to the difference in conditions of motivation under which the 
two groups worked. | 

These differences are as follows: The measure SD/Average showed 
that the variability of the control group was reduced from .42 to .21 
through practice—a change of .21 in variability. The formula for 
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~ aaa showed this change to be from 1.10 to .66—a difference of 
44. The ratio of the three highest to the three lowest was 5.68 for 
the initial practice, and the ratio for the final practice of the same 
individuals was .87—a change of 4.81. The correlation between initial 
score and per cent of gain was found to be —.69. 

The corresponding changes in variability for the highly motivated 


group are as follows: For SD/Average a change from .44 to .27, or a 


i hs 
difference of .17; for ~ _ a change from 1.22 to .72—a difference 








of .50; for the ratio of the three highest to the three lowest a change 
from 7.07 to 2.34, or a difference of 4.73. Initial score correlated with 
per cent of gain —.58. 

These data are presented in Table I in a compact form so that com- 
parisons may be readily made. 

It would be difficult to draw a definite conclusion from Table I 
concerning the effect of motivation on changes in variability in code 
writing. The three measures which are in agreement indicate that the 
motivation tended to lessen the reduction in variability during 
practice. 

Reference to Figure 1 will show that in code writing, the motivated 
group crosses over the control group in achievement on the sixth day of 
practice and maintains its lead from there on. Further examination 
of the same graph will show that at the twenty-first practice the control 
and motivated groups in code writing were further apart than at any 
other point during the whole thirty practice periods. Therefore, it 
seemed reasonable to suppose that, provided the effect of motivation 
upon variability was to lessen its reduction as compared with what 
would have normally taken place, measures applied at these two new 
points (the sixth and twenty-first practices) should show this tendency 
clearly. 

Accordingly complete recalculation of the data on code writing was 
made at these new points in practice rather that at the initial and final 
points of practice. 

In Table II these data are presented in compact form for the code 
writing from the sixth to the twenty-first practice. 

Table II indicates rather definitely that, in the case of the code 
writing, the effect of motivation was to prevent the variability from 
decreasing to the same extent as it did in the control group. In this 
instance, although the differences are in some cases small, the four 
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measures are in agreement and the effect found here is the same as 


that indicated by Table I. 


TasBLe I].—SumMARY OF DIFFERENCES IN CHANGES IN VARIABILITY FOR THE 
CoNnTROL AND Morttvatep Groups In CopE WRITING FROM THE SIXTH TO THE 
TWENTY-FIRST PRACTICE 
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TaBLE IJ].—SuMMARY OF DIFFERENCES IN CHANGES IN VARIABILITY FOR THE 
CoNTROL AND MortvaTEep Groups IN ADDITION 
































Practice Amount of change 
Measure Group oo 
Initial Final | Increase} Decrease _ 
SD Control........ 43 aig. EE py .11 | Equal 
Average | Motivated...... .42 Ge Wie iwe ds S| change 
90P — 10P | Control........ 1.10 at Sere 13 
median | Motivated...... 1.02 Oe bivGiedes .14 | Motivated 
Av. 3 high | Control........ 5.02 ee” Sev aeueee 2.49 
Av.3low | Motivated...... 6.58 Pe 3.78 | Motivated 
Correlation of initial score and per cent of gain 
ELE PPE Ee OE Pe ee Ey ee Te — .61 
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Table III shows what happened in the case of the addition with 
respect to variability. 
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Table III indicates that the effect of added motivation on prac- 
tice in addition was to cause the variability to become further decreased. 
It is to be noted, however, that in spite of the fact that three of the 
measures of variability are in agreement the differences in some cases 
are very small. It would probably be more accurate in the case of the 
addition to say the effect of motivation, as used in this experiment, 
upon changes in variability is negligible. 

Since the results for the effect of motivation on variability for addi- 
tion are contrary to those for code writing, it was felt desirable to 
calculate the variability for each practice of the whole experiment for 
both functions studied. In making this calculation the measure 
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SD/Average was chosen as being most suitable because of its wide 
use by other investigators. This new calculation gave a complete 
picture of what occurred during the whole experiment in terms of 
variability. 

Figure 3 presents a graph of the daily fluctuations in the index of 
variability (V,) for the control and motivated groups in code writing 
and addition. This graph shows clearly that in the case of code 
writing the variability of the motivated group did not decrease 
nearly so much as was the case with the control group. This picture of 
what happened agrees with the result obtained from a comparison 
of the initial and final, as well as the sixth and twenty-first practices. 
In the case of the addition, a comparison of the initial and final 
practices had led to the belief that the effect of the motivation was 
negligible. This is borne out by the second part of this same figure. 
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CONCLUSIONS 


Upon the basis of the data presented in this study and the calcula- 
tions based upon these data, the following conclusions seem to be 
upheld. 

1. The effect of a special, added, or a high degree of motivation upon 
changes in variability is not constant, but varies with the function 
studied. 

2. With the function code writing, added motivation acts to lessen 
the reduction in variability during practice. 

3. Upon the basis of the above conclusion it may also be concluded 
that the added motivation stimulates most strongly those initially 
high in code writing. 

4. With addition, the effect of added motivation is negligible, the 
added stimulation being felt uniformly by the whole group. 








THE RELATION BETWEEN ABILITIES AND 
IMPROVEMENT WITH PRACTICE 


HERBERT WOODROW 
University of Illinois 


The chief purpose of the present analysis was to determine in the 
case of a number of tests the effect of practice on the factor-loadings. 
Changes in the factor-loadings of a test-performance would indicate a 
change in the degree to which the performance depended upon the 
various abilities possessed by the subjects; and, if such changes occur, 
it should be of value to know their nature. For example, it is of 
interest to know whether with practice performances show an increased 
dependence upon a speed factor, or, in case such a factor were found, 
upon Tt _ 

It was also hoped to identify the factors upon which gain scores 
depend, and thus incidentally determine whether the gain scores in 
different performances depend upon the same factor or factors. In 
this respect the investigation was not very successful, but, neverthe- 
less, some rather interesting conclusions concerning gain scores appear 
to be indicated. Gain scores are, of course, not independent scores, 
since they are differences between final and initial scores. As a result, 
unless both final and initial scores are in terms of the same units, the 
gain scores will be distorted. The units here used, with one exception, 
are raw score units. These units have significance, but there are 
other units which may be more significant. While, for example, the 
number of units of work done in a given time constitutes a valid, 
practical, and self-descriptive measure, an increase in the number of 
units done from ninety to ninety-five may represent a much greater 
increase in ability than an increase from forty to forty-five, or vice 
versa. Were the raw scores transformed into units corresponding to 
equal steps in ability, in other words, subjected to absolute scaling, 
there is little doubt but that the conclusions here reached concerning 
factors in gain scores would be considerably modified. 

The data consist of the raw scores made by fifty-six subjects who 
completed thirty-nine days of practice in each of seven tests, all of 
which were given as group tests, and who, in addition, took a number 
of end tests, including intelligence tests, either before or after practice. 
In none of the practice tests, with the exception of the speed of making 
gates, was the same form used two days in succession. In the speed 


test, only one form is feasible. The scores made in all tests, inciuding 
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in the case of the practice tests initial, final, and gain scores—a total 
of thirty-three variables—were intercorrelated and the correlational 
matrix subjected to a factor analysis by Thurstone’s centroid method. 
The factors were then rotated by the graphic method so as to maximize 
the number of insignificant factor loadings. 

The seven tests used as practice tests were as follows: 


1. Horizontal Adding.—Thirty problems in addition, each problem con- 
sisting of adding six numbers, varying in length from three to seven places 
and arranged in a horizontal line. Ten forms. Time, ten minutes. Score, 
the total number of correct digits in the correct place in the answers. 































































































e——_ 15——— 
‘ r ke 25° ~ 
«& > ‘ e e ° ° 
e - e ° - ° 
& . ° . * . ° 
& - . ° 7 + ° 
@ , e e ° + . 
& . 7 ° ¢ * e 
S Cc 
B 
C=TEST FORM 





























A= STIMULUS CARD 


B=EXPOSURE FRAME 
Fia. 1.—Modified spot-pattern test. 


2. Substitution —Writing a digit under each letter of a page of evenly 
spaced letters in accordance with a key list of paired digits and letters. The 
test sheets contained thirty lines of capital letters. Each line was made up 
of eight letters BFHKMTW2Z, double spaced and arranged in irregular 
order. A new key was used each day, but otherwise the test sheet was the 
same throughout. Time, ten minutes. Score, number correct minus number 
of errors. 

3. Spot-pattern Test, Modified—On twenty-four exposure cards, ten and 
five-tenth inches square, were stamped black disks, one-half inch in diameter, 
in irregular patterns varying in number from four to nine, four of each num- 
ber. The spots, or disks, were so placed that they always fell at one of the 
intersections of an imaginary cross-section sheet composed of squares one 
and five-tenth by one and five-tenth inches. The cross-section lines were not 
drawn on the stimulus-cards but were indicated by short lines drawn on a 
framework in which the cards were exposed (see Fig. 1). Each card was 
exposed to the subjects as a group for fifteen seconds. The subjects were 
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provided with response sheets on which were printed twenty-four rectangular 
forms. Each of these forms was one and five-eighth inches square and con- 
sisted of a set of thirty-six dots representing the intersections of imaginary 
cross-section lines. Each of the exposed patterns had to be reproduced by 
encircling the proper dots on one of the response-forms, after removal of the 
stimulus-card. The time allowed for reproducing each pattern was thirty 
seconds. Thirty-six forms were used, that is, thirty-six different sets of 
twenty-four stimulus-patterns each. The .response-sheets were of course 
the same throughout. The original score on the spot-pattern test was the 
percentage of errors. The assumption was then made that goodness of 
performance varies (inversely) with the o value of these per cent scores, and 
such o values were the scores used. 

4, Anagrams.—The task consisted in rearranging letters to make words. 
The test-sheets were composed of one hundred twenty sets of letters, arranged 
in four columns of thirty each. The number of letters per word in the 
columns increased from four in the first to seven in the last. The thirty- 
nine completely different forms required a list of five thousand seventy words. 
Time, ten minutes. Score, total number of words correct. 

5. Cancellation with Multiple Instruction—The test blanks provided 
forty-nine lines of all letters of the alphabet printed as capitals with uniform 
spacing in eleven point, Intertype Scotch, with approximately thirty-eight 
letters to a line. Nine forms were used. When required, several sheets 
of the same form were distributed for one day’s practice. The instructions, 
modelled after those used by Philip,! were as follows: 

“1. Draw a line through each vowel (A, E, I, O, U) which comes between 
two consonants, 1.e., which stands alone between two consonants. 

‘‘2. Where there are two vowels and nothing else between a pair of con- 
sonants, draw a line through the second one. 

‘3. Where there are three or more vowels between two consonants, do 
nothing. 

‘Work as rapidly and as accurately as youcan. Your score depends upon 
speed and accuracy.” 

Time, ten minutes. Score, number of correct cancellations minus number 
wrong. 

6. Length Estimation.»~—The subject was required to estimate the length 
of the right-hand segment of a black, one-meter rod divided by a white 
pointer, as a percentage of the total length of the rod. The dividing pointer 
was given one hundred different settings at each practice period. Each list 





1Philip: The Measurement of Attention. Catholic University of America. 
Studies in Psychology, Vol. I, No. 2, 1929. 

2 This test was devised and given by F.L. Ruch. He also provided the material 
used in the anagrams test and participated in the conduct of the experiment in 
various ways. 
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of one hundred settings was used only once. Score, total of arithmetical 
deviations of estimated percentages from the correct percentages. 

7. Making Gates.—A speed test in which the subjects were to make “‘ gates”’ 
consisting of four horizontal and one diagonal line in each square of a page 
on which was printed a rectangle seven by nine inches divided into one 
thousand eight squares. Time, ten minutes, divided into five two-minute 
periods separated by rest periods of one minute. Score, number of gates 
(five lines each) completed. 


Since a considerable number of tests is required to obtain a mean- 
ingful set of reference abilities, other tests, termed end-tests, were 
given, both before and after the thirty-nine days devoted to practice. 
Inasmuch as the factors in improvement were unknown at the time of 
testing, these additional tests had to be chosen rather blindly. Had 
the outcome been known in advance, a longer and probably better 
list of tests would have been used.! 

The end-tests were as follows: 


1. Thorndike CAVD, Levels M, N, O, P, and Q, Form 2.—Time allowed, 
two hours, forty minutes. Most of the subjects claimed they had answered 
all the questions they were able to answer considerably before the booklets 
were collected, and were allowed to cease working on them. This test was 
given after the practice sittings were ended. 

2. Otis Group Intelligence Scale, Advanced Examination, Forms A and B.— 
Only six of the tests were given; namely, those labelled Directions, Proverbs, 
Arithmetic, Geometric Figures, Similarities, and Narrative Completion. 
The times allowed were not identical with those specified in the manual of 
directions, and were shorter with form B, given second, then with form 4A, 
given first. The twelve scores obtained with the two forms were summed 
into one total score. Both forms were given before the practice sessions 
began—form B, by three days, and form A, by ten days. 

3. Analogies —Two forms, especially prepared, each containing sixty 
items, with the correct answer to be indicated by underlining one of five 
given choices. One form was given before, and one form after practice. 
Time, five minutes, thirty seconds. Score, number right. 

4. Form Analogies, and 5, Artificial Language-—Tests 4 and 5 were the 
ones contained in the Psychological Examination published by the American 
Council of Education. Two scores were obtained with each test. The first 
score was the average made on two forms, 1930 and 1931, given ten days 
apart, before practice, and the second score was that made on the 1932 form, 
given after practice. 





1A second experiment has now been completed with a larger number of sub- 
jects, a greater number of practice periods, a larger and, it is believed, better set 
of ‘‘end” tests. In the factor-analysis of the data, gain scores will be omitted. 
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6. Thurstone’s Categories Test—Marking each of six words as 1 or 2, to 
indicate in which of two categories they belong, the nature of the two cate- 
gories being indicated by two adjacent lists of four words each. A total of 
thirty-two sets of six words each, to be so marked. Fore-practice was pro- 
vided. The test was given before practice, in two segments, separated by 
ten days. Total time, exclusive of fore-practice, fourteen minutes. Score, 
number of words correctly marked. 

7. Mental Multiplication Twenty-five problems. Answers to be written 
upon signal ‘‘write.” Time of ten seconds allowed for the five two-by-one 
place multiplications and fifteen seconds for the twenty three-by-one place 
multiplications. Score, number of digits correct. The score used is the 
average of two scores from two forms both given before practice. 

8. Speed of Making Crosses.—The test sheet was the same as that for the 
“making gates” practice test. One cross to be placed in each square. Time, 
six minutes, preceded by one minute of fore-practice followed by one minute 
of rest. Score used, average of two trials, both given before practice. 

9. Three-digit Cancellation.—The subjects were instructed to cancel every 
2, 4, and 9 on a printed page consisting of the digits 0 to 9, inclusive, arranged 
in irregular order and evenly spaced. Time five minutes. 


It is extremely important in studies of practice to have highly 
reliable initial and final scores. As is well known, unless the scores 
are highly reliable, the correlation between initial scores and gains 
appears less positive or more negative than the true correlation. High 
reliability of initial and final scores was secured, in part, by using 
relatively long tests (no test requiring less than ten minutes), and also 
by amalgamating the scores made at several sittings. What is here 
termed the initial score is the average of the first several days of 
practice, preceded by several minutes of fore-practice and by a number 
of mental tests given before the experiment proper began in order to 
accustom the subjects to group-test procedure. What is called the 
final score is also the average of several scores. As a result, the 
Spearman-Brown reliability coefficients are all over +.90 and average 
+.94, and are as high for initial scores as for final scores. The 
reliability of the various scores, as well as the mean and the standard 
deviation of their distribution, is shown in Table I. 

Twenty-one of the total of thirty-three variables which were 
intercorrelated consisted of the three scores, initial, final, and gain 
scores, from each of the seven practice tests; the other twelve were 
scores from the end tests. There were only nine different end tests, 
but in the case of three of these tests; namely, artificial language, form 
analogies, and verbal analogies, scores obtained from two different 
forms, given before and after the practice sittings, were retained as 
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separate variables. The five hundred twenty-eight coefficients of 
correlation between the pairs of these thirty-three variables were 


TaBLeE I.—Data CoNcERNING Practice TEstTs 
r = Spearman-Brown reliability coefficient 
































Initial score Final score 
Test ee r |Mean| ¢ bi a r | Mean | o 
Horizontal adding.......... Ist 5 |.938) 47.1/11.3) last 3|.976) 85.6) 21.2 
SN i LiSin ese etaw ds Ist 5 |.944) 395.8'61.3) last 3|.942| 487.2) 96.1 
IID, aictiene duane ais Ist 5 |.949) + .458) .397) last 3 |.926)-+1.763) .603 
nk bbe nines oenked Ist 5 |.906) 103.0\32.2) last 3|.920) 88.0) 31.4 
I os wna ng bh dines Ist 2 |.954| 280.2/59.9) last 2|.960) 576.0/112.1 
IED, ow cc ctsevdcas Ist 3 |.965) 506.0/79.7| last 3|.938) 617.5) 79.0 
PD dein det oen an cn Ist 5 |.938) 34.1) 7.4/ last 3|.943} 51.2| 8.2 








calculated by the Pearson product-moment method. As a sufficiently 
accurate estimate of the communalities required for the diagonal 
entries in the correlational matrix, the highest correlation of each 
variable with any of the others was used. A new estimate made in 
the same way was used in each matrix of residual correlations. The 
nine factors obtained by the centroid method! are shown in Table II. 

The test numbers in Table II (as also in Table IV) stand for the 
following test scores: 


1. Horizontal Adding, initial 19. Anagrams, initial 
2. Horizontal Adding, final 20. Anagrams, final 

3. Horizontal Adding, gain 21. Anagrams, gain 

4. Substitution, initial 22. Artificial Language 

5. Substitution, final (A, before practice) 

6. Substitution, gain 23. Artificial Language 

7. Spots, initial (B, after practice) 

8. Spots, final 24. Form Analogies, A. 

9. Spots, gain 25. Form Analogies, B. 
10. Multiple Cancellation, initial 26. Verbal Analogies, A. 
11. Multiple Cancellation, final 27. Verbal Analogies, B. 
12. Multiple Cancellation, gain 28. Thorndike CAVD 
13. Relative Per Cent Length, initial 29. Average 6 Otis Form A. and 
14. Relative Per Cent Length, final 6 Otis Form B. 

15. Relative Per Cent Length, gain 30. Categories 

16. Speed, gates, initial 31. Cancellation, 3-digit 
17. Speed, gates, final 32. Arithmetical Problems 
18. Speed, gates, gain 33. Speed, making crosses 





1 The procedure followed was that described by Thurstone in The Vectors of 
Mind, 1935, Ch. III. 
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TasLe IJ.—Oricinat Factor LoapInas 


The factors are designated by the roman numerals I to IX. 
h* = proportion of the total variance of the test due to the nine common 
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factors. 
Test II III IV V VI VII | VIII Ix h? 
1 .588} —.180| —.145| .202} —.092) .125| .047) —.256| —.296| .619 
2 . 563} —.364) —.288)  .386) —.282)  .268] —.256) —.066) —.100| .913 
3 .347| —.343] —.262) .314/ —.317]} .192| —.361} .098) .103] .693 
4 .482| —.337| —.215| —.220) .336) .083| .086) .183| —.160| .627 
5 .466| —.632| .022) —.358} —.134) .268| .226) —.187| —.151| .944 
6 .190} —.449} .189| —.230) —.381) .224) .110) —.262| —.015| .603 
7 .656| .242) —.249] —.158} .151) .064| .069) —.136|; .112) .639 
~ .5629| .288| —.807| —.361} —.127| .313) .205| —.072) .241) .970 
9 .130} .144) —.421) —.289) —.322) .291/ .117] .074| .132| .523 
10 .489} —.263} —.172) —.099} .292) —.344) .284) .117| —.167| .674 
11 .614) —.500] .296) —.107) —.081) —.312| .209} .176| .196| .943 
12 .499} —.308| .580| .129] ~—.246) .071) .265| .322) .230|) .989 
13 .355| .386| —.449) —.245) .018] —.115) —.084| .444) —.209| .798 
14 .319| .130) —.160| —.495} —.174) —.516| —.370} .123) .179| .870 
15 —.091} —.184) .177| —.238} —.066| —.323| —.238| —.347) .268| .488 
16 .606| —.241) —.151} .331/ .456) —.116) —.123} —.117| .231| .750 
17 .649| —.304) —.044) —.256} .233) .068| —.249) .126] —.196| .756 
18 .045} —.009]  .003) —.477| —.208} .439| —.164| .136) —.441| .706 
19 .414| .274| —.275| .435) —.109] —.445) .251/ —.264) —.250) .916 
20 .429| .147| —.343| .547| —.078} —.296) .067| —.143) —.035| .742 
21 .069} —.107| —.139} .181; —.011/ .094) —.154) .043) .234) .158 
22 .609} .302) .286) .144) .209| —.027| —.077| —.033| —.074| .622 
23 .687| .306) .197] .123} .096] .128] .225) .059)  .063| .703 
24 .681| .276| .298) —.071; —.077} .019) —.125|) —.140) —.116| .689 
25 .672|} .234| .097] —.207} .109} .029] —.207| .060| .120) .632 
26 .576} .440) .292) .317| .227) —.015) —.122) —.160| .081| .810 
27 .517| 446] =.315] += .198} = . 336} — .019} —.116} —.123| —.041) .748 
28 .648} .335} .300) .123) —.282) —.184) —.091) —.071| —.108) .776 
29 .738| .433) .395) .060/ .023) .030)  .010) —.060| —.025) .900 
30 .528| .336| .345) .279| .407/ .129] .063/ .113) .194) .825 
31 .469] —.425|  .232) —.043} .045) —.064) .225| .344/ —.315) .729 
32 .399! .333} —.144) .094' —.289} .138) .195| .199 021) .480 
33 .450! —.387| —.165} —.045} .378) —.186) .018! —.113) .152 .595 
RESIDUALS AFTER REMOVAL OF NINE Factors 
Magnitude Frequency Magnitude Frequency 
+.270 to +.251 1 — .010 to — .029 85 
+.250 to +.231 0 — .030 to — .049 » 53 
+ .230 to +.211 1 — .050 to — .069 38 
+.210 to +.191 1 — .070 to — .089 25 
+.190 to +.171 4 — .090 to —.109 10 
+.170 to +.151 0 —.110 to —.129 9 
+.150 to +.131 3 — .130 to —.149 4 
+.130 to +.111 ~ — .150 to —.169 5 
+.110 to +.091 19 —.170 to — .189 1 
+.090 to +.071 18 — .190 to — . 209 5 
+.070 to +.051 32 
+.050 to +.031 49 — .250 to — .269 2 
+.030 to +.011 70 
+.010 to — .009 84 — .350 to — .369 1 





Mean Residual = — .004; odie = .066 
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The transformation matrix was calculated by the graphic method. 
All pairs of axes were rotated until, so far as could be determined by 
inspection, no further rotation of any of the possible pairs would 
produce an increase in the number of insignificant factor loadings. 
Sixty-two rotations of pairs of axes were made. Since there were only 
fifty-six cases, it is clear that the o of any loading must be large. Since 
the o of an original correlation of zero when n is fifty-six is .13, in rotat- 
ing axes all loadings of less than twice this amount, 7.e., +.26, were 
regarded as small and probably insignificant. Only one negative 
loading in excess of this magnitude remained after the rotation of axes; 
namely, the loading of —.287 on the part of variable eighteen, gain in 
speed, with factor VI. Some negative loadings are necessitated by 
the fact that negative correlations occur in the original matrix. 


TaBLE III.—TRANSFORMATION Matrix 





I II III IV V VI VII VIII IX 





533) .319| 294, .227/ .416/ .383/ .198| .206] . 282 
5411 —.406| .261] —.316] —.491| .219] —.065 170) — 234 
480| .474| —.472| —.346] —.143] —.302} .165] —.252| O11 
2011 —.001| —.360| .567| —.147| .376 ~ 420 — .372| —.182 
-285| —.518| —.174| —.138| .693) —.220/ —.118] —.148| —.194 
187] —.022} .472| .154| —.121] —.493| —.394/ —.340| 432 
~.166| .376| .310| —.494/ .219| .311| —.422/ —.299| —.280 
095} .250/ —.112} .086| .048| —.297| —.547] .693| —.197 
068| 1811 .366/ .347/ .000/ —.315| .322] —.132| —.697 
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The transformation matrix calculated from the rotations of pairs 
of axes is given as Table III. Each column of the matrix may be 
regarded as an equation. For example, if the original nine factor 
loadings (Table II) of any test be designated a, b, - - - 7, then the new 
loading with Factor I of that test will equal .533a + .541b - - - + .06827. 
Similarly, the second column of the transformation matrix gives the 
equation by which the loadings of any test with the rotated Factor II 
may be obtained from the original loadings with all nine factors. In 
this way, that is, by multiplying the original factor matrix (Table I) 
by the transformation matrix (Table III), the transformed factorial 
matrix (Table IV) is obtained (in Thurstone’s notation, FG = V). 
The transformation is orthogonal, and, therefore, the transformed 
factors (or rotated axes) remain orthogonal. Only the results obtained 
after this transformation, or rotation, should be considered as repre- 
senting the outcome of the analysis. The data given in Tables ITI and 
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III merely represent necessary steps, and are presented solely because 
they permit further computations, or further rotations of axes, should 
one care to make them. 


TaBLeE IV.*—FactroriaL Matrre 


AFTER ROTATION 


























Test} I II III IV Vv VI VII | VIII Ix 
1 .132| .137] .132} .239) .243) .448) .006] —.129| .484 
2 .042| .200) .134| .727) .111| .270| —.036| —.022|  .497 
3 | —.042| .194) .084| .752) —.014) .057| .008| .106| .272 
4 .032} .062) .163) .049} .680) .001/ —.088| .222) .276 
5 | —.209| .490| .276) —.026) .469| —.031) .155) —.133] .565 
6 | —.208| .497] .163| —.005| .026] —.077| .244/ —.232) .405 
7 .368| —.074/ .512} .051/ .318} .281| .177) .124| .083 
8 .120| —.026} .930) .042; 121) .166) 077; .190} .091 
9 | —.154)) .031/ .648 049) —.143} .000) —.054) .213) .099 
10 | —.012) .144 —.007| —.083} .684) .356) —.008| .228) .023 
11 .090) .746| —.067| .077) .511) .119} .261) .158| —.037 
12 .349} .907| —.046) .104) .151) —.069) —.052| —.017| —.022 
13 .158| —.222) .296] —.039| .087| .226) —.158) .739|  .080 
14 .003} —.002} .105) .006; .031| .096 608; .685| —.086 
15 | —.166| .044| —.151) —.019} —.014) —.098| .634) —.090| —.117 
16 .268| —.079| —.036} .481| .598| .192) .134) —.124) —.105 
17 .229| .090) .053} .157| .567| —.039| .169) .318| .467 
18 | —.043| .009|  .198| —.221) —.133] —.287| —.085| .266| .658 
19 .126| —.060| —.008} .061) .032) .945) —.019| —.019| —.044 
20 .150} —.057| .027} .388 .065| .744) —.047| —.005| —.097 
21 .009| —.002} .072} .378| .024| —.066)  .002) —.021; —.076 
22 .714| .050| —.025) .002) .172| .243) .100' .038) .106 
23 .676| .246| .274 —.016) .194| .265| —.071| .006) .019 
24 .623|} .183} .114; —.051| .025| .252) .291/ .098| .310 
25 .574| .067| .256/ .066/ .211) .043| .288] .293] .128 
26 .822} —.046) —.004) .112) .059| .289| .144; —.113) —.036 
27 .804| —.123) —.056) —.038) .126| .224) .112| —.059) .021 
28 582} .291) —.006} .020) —.151) .456| .251) .165) .176 
29 .832| .217) .133] —.089) .040| .291) .164) .049) .135 
30 .839| .069| .048| .063} .250) .048) —.103) —.095| —.175 
31 .098}  .525| —.190} —.070} .482) .090) —.187| .211|] .294 
32 .275| .198}| .399) .078| —.155| .295| —.210) .213| .023 
33 .012} .038 .046) .208| .692) 121) .237) —.010) —.018 














1 See explanatory note preceding Table IT. 


As a matter of fact, numerous further calculations besides those 


here described have been made. 


For example, it was found possible 
to obtain factors closely resembling the first five factors of Table V 
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by a procedure which utilizes quite different sign-changing rules than 
those of Thurstone’s centroid method.! 

From the transformed factorial matrix (Table IV), it may be seen 
that every test, with the exception of number 21 (improvement in 
anagrams), has a significant loading (.39 or over) with at least one 
factor and an insignificant loading (under .26) with at least as many as 
five factors. Only one variable, final substitution score, shows load- 
ings over +.39 with as many as three factors. 

To attempt to name the factors is hazardous since conventional 
names ordinarily apply to a total or complete operation, whereas a 
factor, unless it shows one or more loadings approaching unity, is 
only one abstract causal condition acting along with others in the 
determination of goodness of score in any whole operation. More- 
over, since some of the variables such as the Thorndike and Otis scores 
represent composite scores, and since the number of tests was relatively 
small, there is little likelihood that the factors obtained represent 
truly ‘“‘primary” abilities. Nevertheless, it appears desirable to 
attempt roughly to identify the factors. Consequently, it will be 
pointed out with what tests each of the factors shows the highest load- 
ings, and thus in what sort of performances each factor is important. 

Factor 1.—Important in tests of intelligence or tests such as oppo- 
sites and analogies that have been alleged to be good tests of ‘‘g.’” 
That it is not “‘verbality”’ is indicated by the very low correlation 
with anagrams; yet the only non-verbal test which correlates as high as 
+.39 with this factor is that of form-analogies. An attempt was made 
to discover some method by which the loading shown by form-anal- 
ogies with this factor could be reduced to insignificance. None was 
discovered. 

Factor I1.—Important in tests which have not infrequently been 
designated tests of attention. The highest correlations of this factor 
are with final and gain score (11 and 12) of the Philip’s multiple 
instruction letter cancellation test, one of Philip’s battery for measur- 
ing attention. Other tests with high loadings are 3-digit cancellation 
(31) and substitution final and gain score (5 and 6). On the other 
hand, the correlation with horizontal adding, which was also one of the 
tests of attention devised by Philip, is insignificant. This last test 
appears to be more of a computation or numerical test, than an atten- 





1 Woodrow, H. and Wilson, L. A.: ‘‘A simple method of approximate factor 
analysis.”” Psychometrika, Vol. II, 1937, pp. 245-258. 
2 Spearman, C.: The Abilities of Man, 1927. 
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tion test, in the case of the present group of subjects. Possibly this 
factor could be termed speed of perception of detail, or, perhaps better, 
conceived as a factor in tests of analytic reaction (reactions to items 
which need to be attended to separateiy). 

Factor I111.—The only certainly significant correlations are with 
the spot-pattern scores. Final spot-pattern score shows the high 
factor loading +.930. 

Factor 1V.—Possibly a numerical factor, since the highest correla- 
tions are with final and gain scores in horizontal addition. The 
correlations of +-.388 and +.378 with final and gain scores in anagrams, 
though low and possibly insignificant, make an interpretation of this 
factor somewhat hazardous. The correlation of +.481 with initial 
score in speed of making gates is also hard to explain. 

Factor V.—Rather clearly a speed factor. The significant correla- 
tions are with speed of making gates, initial and final scores, speed 
of making crosses, Philip’s cancellation, initial and final, digit- 
cancellation, and substitution, initial and final. 

Factor V1.—Correlates +.945 with initial anagrams score, but also 
shows a correlation of +.448 with initial horizontal adding score. A 
puzzling relation between anagrams and horizontal adding thus appears 
in the case of two different factors, Factor IV being involved in the final 
and gain scores of both tests and Factor VI being particularly promi- 
nent in the initial scores of both tests. ! 

Factors VII and VIII both pertain primarily to the test of estima- 
tion of relative length, Factor VII owing its existence largely to the 
correlation of the final and gain scores and Factor VIII to the correla- 
tion of the initial and final scores. This result illustrates the complica- 
tion resulting from using three different scores, initial, final, and gain, 
derived from one practice test, in the same matrix of correlations. 

Factor IX shows the highest loading in the case of gain in speed. 
Its interpretation may be connected with the meaning of the gain in 
speed scores, but it is not apparent that the other loadings throw any 
light upon that interpretation. 

Certainly great caution should be exercised in drawing any final 
conclusions from the preceding analysis. On account of the small 
number of cases, and the approximations used as communalities, and 
the further fact that a merely graphic method of rotating axes has been 
used, the results should, no doubt, be considered as only a preliminary 
approximation to the truth. Nevertheless, certain facts stand out so 
decisively as to leave little doubt of the validity of certain general 
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conclusions which are of value in connection with a number of problems 
concerning practice. 

1. Perhaps the most important fact established is that marked 
changes in factor loadings occur with practice. Inthecase of twoof the 
tests, however, speed and anagrams, the changes are not enormous, and 
in the case of anagrams are possibly not significant, though the change 
from .945 to .744 in the loading with Factor VI and that from .061 to 
.388 with Factor IV are rather pronounced changes. While, then, 
it may not be established beyond doubt that practice always produces 
significant changes in factor loadings, there can be little doubt that such 
changes usually occur. Horizontal adding shows a change from 
+.239 to +.727 in the loading with Factor IV. Substitution shows a 
change in the loading with Factor II from +.062 in the initial per- 
formances to +.490 in the final performances. At the same time it 
shows a drop in the loading with Factor V from +.680 to +.469. 
The spot-pattern test shows a rise from +.512 to +.930 with Factor 
III. Philip’s cancellation test shows an increase from initial to final 
score in the Factor II loading from +-.144 to +.746. Estimation of 
relative length changes in loading with Factor VII from —.158 to 
+.608. Speed of making gates changes from +.481 to +.157 in 
its loading with Factor IV and from —.105 to +.467 in its loading 
with Factor IX. 

That these large changes in the factor loadings are due to practice 
is indicated by the absence of such changes in the case of the three 
non-practice tests—artificial language, form-analogies, and verbal 
analogies—given twice, before and after practice. The two scores of 
each of these tests show variations in their factor loadings of only 
that magnitude which has here been considered unreliable, and the 
largest change shown in the loadings of any of these three tests with 
any of the factors; namely, the change from —.025 to +.274 in the 
loading of artificial language with Factor III, is far less significant than 
the changes shown by the tests in which practice was given. 

It is interesting to observe that a recent analysis made by the 
Hotelling method! harmonizes well with the finding of marked changes 
in factor loadings, since it shows that decided changes in the weights of 
the various principal components resulted from a brief period of 
instruction concerning various helpful devices, this instruction being 
interpolated between the initial and final administration of the tests. 





1 Anastasi, A.: ‘“‘The influence of specific experience upon mental organiza- 
tion.” Genetic Psychol. Monog., Vol. XVIII, No. 4, 1936, pp. 245-355. 
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In general these changes with practice in the factor loadings mean 
that the quantitative pattern of abilities determining goodness of 
performance changes with practice, 7.e., a performance after practice 
is likely to depend for its success more on one ability or less on another 
than it did initially. Such a change must mean a change in the mode 
of operation whereby the subject carries out the task he has been 
instructed to accomplish. 

In a sense the task which the subject practices remains the same. 
There is no change either in the instructions, that is, in the task the 
subject is asked to perform, nor in the manner in which the experi- 
menter scores the records made on the test-papers. If the goodness of 
the scores be regarded as determined by a set of codperating but 
independently variable abilities, then practice may be regarded as a 
change in the conditions under which the various constituents of this 
set of abilities operate. A fixed amount of practice may be regarded 
as a fixed change in conditions; but this fixed change in the total 
constellation of conditions does not result in an equal increase in 
favorableness for the operation of all the codperating determining 
factors. 

2. There is no general tendency for the loading with Factor I or 
for the average r with four ‘‘intelligence”’ tests to be larger in the 
case of final scores than initial scores. In fact, the tendency is 
rather in the opposite direction, though the changes are small. Only 
in the case of the Philip’s cancellation test does final score have a 
higher positive factor loading than initial score. In the case of Factor 
V, also, a factor which is rather clearly a speed factor, the loadings tend 
to decrease with practice. In none of the seven practice tests does 
final score show a significantly higher loading with the speed factor 
than does initial score. In fact, the final score loading is smaller than 
the initial score loading for all tests except anagrams, in which case 
both the initial and final loadings are insignificant (+.032 and +.065). 
If, then, initial and final measures are equally reliable, as is here the 
case, it is not true that there is any general tendency for test-perform- 
ances to become with practice more dependent upon or better measures 
of some supposedly common factor, such as “‘g,”’ intelligence, or speed. 
One reason why test scores have sometimes been supposed to do so is 
probably the fact that as a rule previous investigators have not used 
initial scores of as high reliability as the final scores. 

3. There is no sign of any general improvement factor, that is, 
a factor common to the gain scores of all the practice tests. It is 
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particularly noteworthy that Factor I, which somewhat resembles 
Spearman’s ‘‘g”’ factor, shows little if any correlation with any of the 
gain scores. Only in one case, that of multiple-instruction cancella- 
tion, could the loading with this factor (+.349) possibly be regarded as 
significant. Factor I loadings resemble the average correlations of 
each variable with four variables which, taken together, may be 
regarded as measuring “intelligence” as it is commonly conceived. 
These four variables are the Thorndike CAVD, the pool of six of the 
Otis tests, forms A and B, verbal analogies, and artificial language. 
The average correlation of these four scores with the gain scores is 
negligible with the exception of the multiple instruction concellation 
test, in which case it is +.381. Perhaps equally interesting is the fact 
that none of the gain scores shows a significant loading with Factor V, 
which is regarded as a speed factor. Not even in the speed of making 
gates test were the gain scores correlated with the speed factor. 

4. The factor loadings of the gain scores depend largely upon the 
factor loadings of the initial and final scores. Gain scores usually 
correlate highly with final scores, whereas their correlation with initial 
scores seems to fluctuate, widely it is true, about zero or a small nega- 
tive value. To a certain extent, undoubtedly, the higher correlation 
of gain scores with final than with initial scores is due to errors of 
measurement, the errors in the final scores being added to, and those 
in the initial scores being subtracted from, the gain scores. When the 
reliability of initial and final scores is as high as in the present study, 
however, the effect of errors of measurement cannot be a major factor. 
The reason why gain scores correlate higher with final than with initial 
scores is simply that the formula for a gain score is, plus final score 
minus initial score. Consequently, gain scores fluctuate directly with 
final scores but inversely with initial scores. In view of this fact, 
when initial and final scores differ considerably in their loading with a 
given factor, one would expect to find the loading of the gain score 
follow that shown by the final score. In such a case, if an individual 
possessed in high degree an ability which was important for initial 
score but not important for final score, the possession of that ability 
to a high degree would not tend to result in a high gain score. On 
the other hand, a high degree of an ability entering more importantly 
into final than into initial score would almost guarantee a high gain 
score. 

The results of the factor analysis harmonize well with the preceding 
considerations. The seven instances of most marked increase in final 
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over initial loading, all show significant gain-score loadings. For 
example, horizontal adding in the case of Factor IV shows an initial 
loading of +.239 and a final loading of +.727, and the gain-score 
loading is +.752. On the other hand, a high initial loading, particu- 
larly in the absence of an equally high final loading, tends to result in a 
low gain-score loading. There are ten instances, representing every 
one of the seven practice tests, in which the initial loading is +.296 
or higher and the final loading is lower than the initial one. In all 
ten instances, the gain-score loadings are insignificant. For example, 
horizontal adding shows in the case of Factor VI an initial loading of 
+.448 and a final loading of +.270, and the gain-score loading is 
+.057. A more striking case is afforded by anagrams. This test 
shows an initial loading with Factor VI of +.945 and a final loading of 
+.744 and the gain-score loading is —.066. The interesting con- 
clusion then appears to be clearly established, that, in the case of none 
of the seven tests here used, does the amount possessed of an initially 
important ability have any bearing upon the change with practice in 
an individual’s standard score. The possession of such an ability to 
a high degree creates no likelihood of a greater than average or smaller 
than average gain. Even the possession of a high degree of an ability 
which shows a high final loading does not necessarily result in a high 
gain score. It will do so only providing the ability in question is less 
important in determining initial score than in determining final score. 

5. Although no factor common to all or even a majority of the gain 
scores was discovered, several factors show loadings with the gain scores 
of more than one test. Factor II correlates +-.497 with improvement 
in substitution and +.907 with improvement in Philip’s cancellation; 
and Factor IX correlates +.405 with improvement in substitution 
and +.658 with improvement in speed of making gates. And Factor 
IV shows a high correlation with improvement in horizontal addition 
(+.752) and one which may not be negligible (+.378) with improve- 
ment in anagrams. It seems probable that if two tests have a common 
factor in their gain scores, practice in one would result in transfer to 
the other. Since, however, a high gain score loading appears always 
to be accompanied by a high final score loading, it is not likely that 
two tests will both show a high gain score loading with the same factor, 
unless this factor is also an important determinant of final score in 
both tests, and, further, unless it is a more important determinant 
of final than of initial score. It seems unlikely, then, that valid 
predictions as regards transference can be made by a factor analysis 
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of correlations of scores on tests given but once, or tests from which 
only a single score is used. Thus, for example, there is no reason for 
predicting transference from practice in Philip’s cancellation to speed 
of making gates. While both these tests show a significant initial 
correlation with Factor V (+.684 and +.598), they do not both show 
a high final correlation with this or any other factor. Naturally, 
therefore, in view of considerations which have been outlined above, 
their gain scores show no sizeable correlation in common with any 
factor. On the other hand, transference might well be expected from 
practice in Philip’s cancellation to substitution. This expectation 
would be based not on a high common initial loading (which, inci- 
dentally, they have with Factor V) but on the fact that final scores 
in both tests show a high loading with Factor II, and the further fact 
that this factor is altogether inconsequential in the initial scores of 
both tests. On account of the marked increase with practice in the 
loadings with Factor II, both tests also show marked correlation in 
their gain scores with this factor. This appears to be the state of 
affairs which would lead to transference of training. It would seem 
reasonably safe, then, to predict that in a group of subjects similar 
to the one here used, practice in Philip’s cancellation would show 
transference to letter-digit substitution but not to speed of making 
gates. 








A COMPARATIVE STUDY OF THE BRIEF, THE PRECIS, 
AND THE ESSAY WITH RESPECT TO SPEED OF 
READING AND EASE OF LEARNING 


HAROLD BORAAS 


Alfred University 


Writers and teachers have apparently used the brief, the précis, 
and the essay with little knowledge of their relative effectiveness. 
Few, if any, investigators have undertaken to measure the speed and 
effectiveness with which these various forms of printed composition 
will convey thoughts to the mind of the reader. It is the purpose of 
this article to present the results of a comparison of these types of 
composition with respect to speed of reading and ease of learning. 

The brief, the précis, and the essay are quite different in structure. 
The brief is generally in some outline form, setting out the main ideas 
in an orderly, easily perceived manner. Central ideas, with their 
associated subpoints, are set forth in such a way as to give the gist of 
the thought. The précis is a concentrated statement summarizing a 
longer paragraph. Its sentences are short and concise. It appears to 
be for a paragraph what a good summary is for a chapter. The essay 
is a composition written with a view to interpreting and analyzing its 
subject as understood by the author. It displays some literary merit. 


BRIEF AND PRECIS 


Procedure.—Two groups, consisting of college juniors and seniors, 
were equated as to intelligence and achievement. Both groups showed 
an average achievement index? of 1.4, and an average percentile rating 
in intelligence’ of 52. 

The subject-matter used was taken from an article entitled ‘‘The 
Little Man Who ; hook the World.’’* A brief was first made of the 
article; then a précis was written. An objective test, consisting of 
eight true-false statements, four completions, and two multiple choice 
items, was used for evaluating the effectiveness of the reading of the 
brief or the précis. In the administration of the test, the only direction 
given to the subjects was, ‘‘ Read to get facts and meanings.”” Group I 


1 From Alfred University, Alfred, New York. 

2? Cumulative index, from freshman yearon. A = 3,C =1,F = —2. 

Taken from the Thurstone Psychological Examination for college freshmen. 

‘ Found in the World Digest, Dec., 1936, p. 11. The subjects were not familiar 
with this article. 
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was given the brief, and Group II was given the précis. There were 
twenty-six subjects in each of the equated groups. The time taken 
for reading the brief and the précis was eight minutes. This was found 
to be sufficient for all subjects. A time limit of four minutes was used 
for the test. 

Results.—The test papers were marked on the basis of one point for 
each item, giving a maximum score of fourteen points. The results of 
the study are shown in Table I. 


TABLE I.—CoMPARISON OF THE BRIEF AND THE PR&c1is aS TO EasE OF LEARNING 
(Alfred) 





Points 
Number Average 
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The brief shows an average score somewhat higher than that of the 
précis. The reading or learning effectiveness, therefore, is greater for 
the brief than for the précis. 


BRIEF AND ESSAY 


Procedure.—The procedure for testing the learning effectiveness of 
the brief and the essay involved the same equated groups used in the 
study of the brief and the précis. However, in this case, speed of 
reading and a period of rereading were added features. Each subject 
noted his own reading and rereading time according to a time indication 
on the blackboard in the front of the room. The experimenter placed 
a number on the board which represented each ten-second period. 
The subject, then, merely recorded on his paper the time used for his 
reading. 

The subject-matter used was taken from an article entitled ‘‘Oppor- 
tunity,” a radio talk by W. J. Cameron over the Columbia Broad- 
casting System from Detroit as a part of the Ford Sunday Evening 
Hour, on December 13, 1936. This talk was in essay form. A brief 
was made of the article setting forth its main thoughts. A test, con- 
sisting of fifteen completion items, was used to measure the learning 
effectiveness of the brief and the essay. The instruction to the sub- 
jects was, ‘‘ Read to get facts and meanings.”’ Group I was given the 
essay to read, and Group II was given the brief. 
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The test was not timed, although it was found that seven minutes 
was long enough for all to answer the given questions. 


TaBLE II].—CoMPARISON OF THE BRIEF AND Essay as TO SpEep or READING 
AND Ease or LEARNING 









































(Alfred) 
Time used for reading Ease of learning 
Essay Brief 
Essay Brief 
Seowe frequency | frequency 
Seconds | Frequency} Seconds | Frequency 
640 1 260 1 23 
22 
600 1 230 2 21 
590 2 20 6 1 
580 1 210 4 19 1 3 
570 1 200 1 18 1 
560 1 190 5 17 1 2 
550 1 180 2 16 3 4° 
one tA 170 2 15 6 2 
510 2 160 3 14 5 2 
besa va 150 1 13 3 4 
480 4 140 4 12 2 3 
470 1 il 1 3 
wid oa 90 1 10 Ks 1 
450 4 i) 2 1 
440 2 8 
430 1 7 2 
6 1 
370 4 
350 1 
Totals. . 27 27 27 27 
Average score 
483 | 182 | 13.6 | 14.0 














Results.—In this study, speed of reading and rereading was meas- 
ured in terms of seconds, whereas the learning effectiveness was 


measured in terms of a completion test involving twenty-five points. 
The results are shown in Table II. 
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For time of reading, the brief is very definitely superior to the essay. 
The essay takes about two and one-half times as long to read as does 
the brief. The slowest reader of the brief (260 seconds) read the mate- 
rial in about one hundred seconds less time than did the fastest reader 
of the essay (350 seconds). The range of scores is considerably greater 
for the readers of essays than for readers of briefs. This indicates 
that the essay offers some reading difficulties for the slower readers, 
whereas the brief is read in about the same time by both slow and fast 
readers. The brief, then, might prove advantageous in situations 
requiring that all pupils read a certain amount of material within a 
given time. 

For ease of learning, no significant difference seems to obtain. The 
scores for brief and essay are about the same. The distribution of the 
scores are also much the same in both cases. The superiority of 
the brief becomes manifest when we consider the speed of reading. 
The brief is read in less than one-half the time of the essay, and it main- 
tains a learning effectiveness equal to that of the essay. Therefore, 
the brief is favored when quick and efficient perceptual and conceptual 
products are required. 

Additional Results with Respect to the Brief and Essay.—In a study 
of the brief and the essay at St. Olaf College,! involving the same 
materials and procedures as found in this study, the results were as 
follows: 


TaBLE III.—CoMPaARISON OF THE BRIEF AND Essay A8S TO SPEED OF READING 
AND Ease oF LEARNING 











(St. Olaf) 
Time of reading (seconds) Ease of learning (test score) 
- Essay Brief Essay Brief 
260 160 16 17 














Number of cases: Essay, twenty-two; Brief, twenty-nine. 


It is apparent that, although the St. Olaf scores are somewhat better 
than those of the present study, the relative differences between scores 
are approximately the same. 

Another study of the brief and essay'was made at St. Olaf College. 
The same material was used as was given in the foregoing studies. No 





1$t. Olaf College, Northfield, Minn., a college of one thousand students. Dr. 
Julius Boraas conducted the study. 
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rereading, however, was required. The results are shown in Table IV. 
The scores in speed are somewhat lower than in the foregoing studies. 
This, of course, is due to the absence of the ‘‘rereading’”’ requirement. 
The relative differences between scores, however, are much the same 


TABLE IV.—ComPARISON OF THE BRIEF AND Essay WHEN No ReErREADING Was 











REQUIRED 
Time of reading (seconds) Ease of learning (test score) 
Essay Brief Essay Brief 
237 111 16.5 17.6 














Number of cases: Essay, forty-six; Brief, forty-one. 


as in the previous studies. Again, the brief shows the best record 
when both speed and learning effectiveness are considered. 
Retests.—In order to determine the amount of loss of learning and 
to evaluate the relative changes in scores, a retest was given the 
students at Alfred University one month after the original test had 
been given. The same test as was used in the original study was 


employed in the retest study. The comparative results are shown 
in Table V. 


TaBLE V.—CoMPARISON OF ScORES FOR EASE OF LEARNING FOR THE BRIEF AND 
EssaY FOR THE ORIGINAL TEST AND FOR A Test OnE Monts LATER 








(Alfred) 
Original test score | Test score one | Differ- 
(average) month later ence 
Cs inclvisedakes esau Mame 13.6 8.39 —5.21 
RA ey Or a mee ne 14.0 9.69 —4.31 














The loss in learning in the case of the essay is greater than for the brief. 
Furthermore, the superiority of the brief over the essay increases with 
the passage of time. This fact, and the fact of the superiority of the 
brief in speed, points strongly in the direction of greater efficiency of 
the brief as compared with the essay. 


CONCLUSIONS 


With respect to the brief and the précis, the brief is slightly superior 
to the précis in learning effectiveness. ‘ This may be due to the per- 
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ceptual clearness which the brief offers. The main points stand out, 
subpoints are clearly differentiated, and “locational”? images are 
easily formed. 

With respect to the brief and the essay, the brief can be read in 
about one-half of the time needed for the essay type of composition. 
Furthermore, the brief is as effective as the essay for learning. 

In a consideration of the brief, the précis, and the essay, the brief is 
superior to the précis or essay with respect to both speed of reading 
and ease of learning. 

Teachers of all kinds should consider the efficiency of the brief 
when speed of reading and learning effectiveness are stressed. Much 
of the material presented by the teacher can and should be in the form 
of briefs. Many of the reports and papers of the student should like- 
wise be presented in form as briefs. 

It can be argued that the brief does not offer the student the full 
possibilities of writing complete sentences and of providing connective 
tissue in the writing of an article. However, it cannot be denied that 
the brief provides for quick and efficient learning and recall. Writing 
is not merely for the benefit of the writer, but it is done largely to 
convey ideas as speedily and effectively as possible to the reader. 

It is probable that newspapers and magazines could present their 
materials in “brief’’ form without any sacrifice of interest. A gain in 
comprehension might be expected. 

In the realm of textbooks, articles, and books of readings, it is also 
probable that materials could be presented in the form of briefs without 
any loss of comprehension, and with much gain in speed of reading. 








BOOK REVIEWS 


ARTHUR LEONARD ODENWELLER. Predicting the Quality of Teaching. 
New York: Bureau of Publications, Teachers College, Columbia 
University, 1936, pp. 158. 


This investigation does not differ greatly from a large number of 
earlier studies of the factors effecting teaching success. The same 
very questionable assumptions are made and the same techniques used 
—and these are defended with an aggressiveness that implies anticipa- 
tion of criticism. Five hundred and sixty Cleveland elementary- 
school teachers were ranked in terms of their effectiveness by three 
persons—a principal, his assistant, and a supervisor. This ranking 
constituted the basic criterion with which all other measures were 
compared to determine their bearing upon the quality of teaching. 

The fact that the validity of this criterion has never been estab- 
lished did not trouble the author. He remarks: (p. 96) “‘A valid, 
objective measure of effectiveness would be a distinct advantage, but 
the things that have been validated are too few and insecure for an 
absence of validation to be disconcerting.”’ Despite this assurance the 
reviewer was disconcerted. ‘There would seem to be very little reason 
to investigate the factors effecting teaching success if our measure of 
the latter is of unknown validity. The following quotation was 
intended as an argument in defense of a poor criterion, but it, too, is 
not convincing: (p. 97) “There is nothing peculiarly bad about lack 
of validity in the criterion when what the teacher teaches is not 
validated, her method is not validated, and validated methods for 
measuring the achievement of her pupils are generally wanting.” 
Or to paraphrase this argument—‘“‘ None of the other fellows have used 
a valid criterion so why should I?” 

Most of the reported correlations between estimated teaching 
success and other factors were low. There was a rather close relation- 
ship between personality rating and a rating of teaching effectiveness 
(r = +.825), but the same persons were responsible for both estimates. 
All other r’s, with the exception of one between estimated teaching 
effectiveness and personality ratings made by teachers in the same 
building (r = +.53), were +.30 or below. The value of such correla- 
tions for the purpose of predicting the success as teachers of particular 
persons is very slight. 

It is unfortunate that the results of this investigation were reported 
80 long after the data were gathered. Apparently the teacheis were 
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rated in 1928-1929, the foreword was written by William C. Bagley 

in 1934, and the monograph carries the copyright date 1936. The most 

recent reference in the very limited bibliography (fifteen titles) 

was Kriner’s study published in 1931. STEPHEN M. Corey. 
University of Wisconsin. 


J. E. Watutace Wauuin. Personality Maladjustments and Mental 
Hygiene. New York: McGraw-Hill Book Company, pp. 511. 


Personality Maladjustments and Mental Hygiene purports to be a 
textbook for psychologists, educators, counsellors, and mental hygiene 
workers. The general nature of its content and its organization are 
described and the style of writing illustrated in the following quotation 
from the preface: ‘‘The best approach to an understanding of the 
problems of mental health and mental hygiene is to begin with a 
preliminary exposition of the positive concept of mental health and the 
wholesome personality, the different objectives and factors of the 
mental-hygiene program, and the types of cases with which mental 
hygiene is concerned, and then proceed to a detailed discussion of the 
symptoms of personality maladjustments as they are revealed in the 
numerous faulty and unwholesome reaction patterns that unadjusted 
or poorly adjusted people, and even apparently well-adjusted people, 
utilize in the effort to solve their problems, indicating the evils and 
possible virtues of each mode of inadequate response and the remedial 
measures required to correct it.” 

The book is divided into two parts. Part One concerns itself with 
foundational concepts. It is composed of four chapters entitled: 
The Concept of Mental Health and Mental Hygiene; The Remedial 
Preventive, and Positive Objectives of the Mental-Hygiene Program; 
The Physical, Psychological, Social, and Educational Factors or 
Elements of the Mental-Hygiene Program; and, Types of Children with 
Which Mental Hygiene is Concerned. Part Two, which in a sense is 
the heart of the book, is captioned by the author as follows: ‘‘Symptoms 
of Personality Maladjustment as Evidenced by Inadequate or Unwhole- 
some Modes of Response to Difficulties. Specific Types of Faulty 
Methods of Solving Life’s Problems, With Preventive and Remedial 
Suggestions.”” Topics considered in this portion of the book are: 
Nature of inferiority feelings and defense mechanisms, trial and error 
adjustments, regressive and day dreaming adjustments, compensatory 
reactions, mental conflicts and dissociations, resolutions of mental 
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conflicts by inhibition and repression, solution of difficulties by sub- 
stitution and sublimation, with suggestions from psychoanalysis for 
child training. There is a ten page appendix on suggestions for over- 
coming stage fright and other forms of fear. The book has a complete 
bibliography and index. Some of the references are marked with 
asterisks. ‘These are supposed to be of special value to teachers and 
educators. 

The book has in it many surprises. It has a surprisingly large 
number of fragments from biographies selected by Wallin. from his 
student’s work. ‘These have been selected to illustrate all forms of 
personality deviations. Some notion of the nature of the biographical 
illustrations used can be obtained from the captions given them. One 
illustrative caption is: ‘Sixteen Years of Worries, Dreads, Harrowing 
Dreams, and Dire Anticipations of Mother’s Death Precipitated by 
Father’s Sudden Demise, Terminated by Mother’s Sudden and 
Unexpected Collapse in Respondent’s Arms, With Result that Respond- 
ent Resolved Never to Worry Again.”’ 

Even more surprising is Wallin’s use of Edgar Guest as a final 
authority on virtues that can be approved and vices that are to be 
condemned by himself as a mental hygienist. The natural fruits of the 
hatred and jealousy feelings are, for example, illustrated by Wallin 
through Edgar Guest’s poem, ‘“‘He Gave Himself to Hate.” The 
attitude that one is supposed to take towards failure is again, accord- 
ing to Wallin, best expressed by Edgar Guest’s poem beginning, “‘If 
in the end all things prove well, What matter failures here and there.”’ 
Surprising also is his very uncritical, over simplified, and naive descrip- 
tion of Freudian concepts and treatment. The quantative studies of 
repression in psychological literature are not mentioned by Wallin. 
No complete case histories are considered. 

In spite of these omissions, however, and even in spite of Wallin’s 
personal literary taste for Edgar Guest, the book has in it much content 
that is worth reading and from it teachers and parents can gain knowl- 
edge and insight that will definitely help them. H. MELTZER. 

Psychological Service Center, St. Louis. 


Epwarp L. THoRNDIKE. The Teaching of Controversial Subjects. 
Cambridge: Harvard University Press, 1937, pp. 39. 


This is the 1937 Iglis Lecture in Secondary Education at Harvard. 
After defining controversial in terms of relative uniformity in the hold- 
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ing of opinions, the author considers how teaching may be made 
more reasonable and beneficial than at present whether the school 
limits itself to the demonstrably true or tries to discuss freely all 
important problems. It is suggested that in teaching controversial 
subjects the methods of science rather than those of emotion, discus- 
sion, and persuasion shall prevail. ‘‘Schools should lead pupils to 
weigh evidence, not to be moved by it.”” There should be statements 
of relevant facts and opinions, weights assigned, and a quantitative 
determination of the probabilities that may be derived from these. 
This treatment of controversial questions can result in nothing but 
good. The conclusion is that competent teachers in high schools 
should be encouraged to plan scientific treatments of such questions. 
There will be only whole-hearted approval from many educa- 
tionalists for the author’s viewpoint. There may be some difficulty, 
however, in finding enough secondary-school teachers capable of 
handling the techniques advocated. Mixes A. TINKER. 
University of Minnesota. 





